{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Taxi demand prediction in New York City\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "'''\n", "Below libraries must be downloaded/installed:\n", " 1. dask\n", " 2. graphviz\n", " 3. toolz\n", " 4. cloudpickle\n", " 5. folium\n", " 6. gpxy\n", " 7. xgboost\n", " \n", " download migwin: https://mingw-w64.org/doku.php/download/mingw-builds\n", " install it in your system and keep the path, migw_path ='installed path'\n", " mingw_path = 'C:\\\\Program Files\\\\mingw-w64\\\\x86_64-5.3.0-posix-seh-rt_v5-rev0\\\\mingw64\\\\bin'\n", "'''\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pylab as plt\n", "%matplotlib inline\n", "import seaborn as sns\n", "#import matplotlib\n", "# matplotlib.use('nbagg') : matplotlib uses this protocol which makes plots more user intractive like zoom in and zoom out\n", "#matplotlib.use('nbagg')\n", "#from matplotlib import rcParams # Size of plots \n", "\n", "# import dask dataframe\n", "import dask.dataframe as dd # similar to pandas but provides distributed and parallel access\n", "\n", "'''\n", "References for Dask:\n", " 1. https://www.youtube.com/watch?v=ieW3G7ZzRZ0 and https://github.com/dask/dask-tutorial\n", " 2. https://github.com/dask/dask-tutorial/blob/master/07_dataframe.ipynb\n", " 3. https://www.youtube.com/watch?v=mbfsog3e5DA\n", "'''\n", "\n", "import folium # open street map\n", "\n", "# this lib is used while we calculate the stight line distance between two (lat,lon) pairs in miles\n", "#import gpxpy.geo # Get the haversine distance\n", "\n", "# Miscellaneous\n", "import math\n", "import pickle\n", "import warnings\n", "warnings.filterwarnings(\"ignore\")\n", "\n", "# Convert to unix time\n", "import datetime\n", "import time\n", "# Reference for unix timestamp => https://www.unixtimestamp.com/\n", "\n", "# Models\n", "from sklearn.cluster import MiniBatchKMeans, KMeans # Clustering\n", "from sklearn.linear_model import LinearRegression\n", "from sklearn.ensemble import RandomForestRegressor\n", "import os\n", "mingw_path = 'C:\\\\Program Files\\\\mingw-w64\\\\x86_64-7.2.0-posix-seh-rt_v5-rev0\\\\mingw64\\\\bin'\n", "os.environ['PATH'] = mingw_path + ';' + os.environ['PATH']\n", "import xgboost as xgb\n", "\n", "# Evaluation Metrics\n", "from sklearn.metrics import mean_squared_error\n", "from sklearn.metrics import mean_absolute_error" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Information" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Ge the data from : http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml (Jan 2015 and Jan 2016 data)\n", "The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) \n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Information on taxis:\n", "Overview of the Technology Passenger Enhancements Project (T-PEP)\n", "\n", "
Yellow Taxi: Yellow Medallion Taxicabs
\n", "

These are the famous NYC yellow taxis that provide transportation exclusively through street-hails. The number of taxicabs is limited by a finite number of medallions issued by the TLC. You access this mode of transportation by standing in the street and hailing an available taxi with your hand. The pickups are not pre-arranged.

\n", "\n", "
For Hire Vehicles (FHVs)
\n", "

FHV transportation is accessed by a pre-arrangement with a dispatcher or limo company. These FHVs are not permitted to pick up passengers via street hails, as those rides are not considered pre-arranged.

\n", "\n", "
Green Taxi: Street Hail Livery (SHL)
\n", "

The SHL program will allow livery vehicle owners to license and outfit their vehicles with green borough taxi branding, meters, credit card machines, and ultimately the right to accept street hails in addition to pre-arranged rides.

\n", "

Credits: Quora

\n", "\n", "
Footnote:
\n", "In this notebook we are considering only the yellow taxis for the time period Jan 2015 & Jan 2016" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Collection\n", "Below are the details yellow taxi trips data from jan-2015 to dec-2016\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
file name file name size number of records number of features
yellow_tripdata_2016-01 1. 59G 10906858 19
yellow_tripdata_2016-02 1. 66G 11382049 19
yellow_tripdata_2016-03 1. 78G 12210952 19
yellow_tripdata_2016-04 1. 74G 11934338 19
yellow_tripdata_2016-05 1. 73G 11836853 19
yellow_tripdata_2016-06 1. 62G 11135470 19
yellow_tripdata_2016-07 884Mb 10294080 17
yellow_tripdata_2016-08 854Mb 9942263 17
yellow_tripdata_2016-09 870Mb 10116018 17
yellow_tripdata_2016-10 933Mb 10854626 17
yellow_tripdata_2016-11 868Mb 10102128 17
yellow_tripdata_2016-12 897Mb 10449408 17
yellow_tripdata_2015-01 1.84Gb 12748986 19
yellow_tripdata_2015-02 1.81Gb 12450521 19
yellow_tripdata_2015-03 1.94Gb 13351609 19
yellow_tripdata_2015-04 1.90Gb 13071789 19
yellow_tripdata_2015-05 1.91Gb 13158262 19
yellow_tripdata_2015-06 1.79Gb 12324935 19
yellow_tripdata_2015-07 1.68Gb 11562783 19
yellow_tripdata_2015-08 1.62Gb 11130304 19
yellow_tripdata_2015-09 1.63Gb 11225063 19
yellow_tripdata_2015-10 1.79Gb 12315488 19
yellow_tripdata_2015-11 1.65Gb 11312676 19
yellow_tripdata_2015-12 1.67Gb 11460573 19
" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['VendorID', 'tpep_pickup_datetime', 'tpep_dropoff_datetime',\n", " 'passenger_count', 'trip_distance', 'pickup_longitude',\n", " 'pickup_latitude', 'RateCodeID', 'store_and_fwd_flag',\n", " 'dropoff_longitude', 'dropoff_latitude', 'payment_type', 'fare_amount',\n", " 'extra', 'mta_tax', 'tip_amount', 'tolls_amount',\n", " 'improvement_surcharge', 'total_amount'],\n", " dtype='object')\n" ] } ], "source": [ "# Looking at the features\n", "month = dd.read_csv('C:\\\\Users\\\\HARSHALL\\\\Desktop\\\\Harshall\\\\Courses\\\\Applied AI\\\\Case Studies\\\\Taxi Demand Prediction\\\\Data\\\\yellow_tripdata_2015-01.csv')\n", "print(month.columns)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distancepickup_longitudepickup_latitudeRateCodeIDstore_and_fwd_flagdropoff_longitudedropoff_latitudepayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amount
022015-01-15 19:05:392015-01-15 19:23:4211.59-73.99389640.7501111N-73.97478540.750618112.01.00.53.250.00.317.05
112015-01-10 20:33:382015-01-10 20:53:2813.30-74.00164840.7242431N-73.99441540.759109114.50.50.52.000.00.317.80
212015-01-10 20:33:382015-01-10 20:43:4111.80-73.96334140.8027881N-73.95182040.82441329.50.50.50.000.00.310.80
312015-01-10 20:33:392015-01-10 20:35:3110.50-74.00908740.7138181N-74.00432640.71998623.50.50.50.000.00.34.80
412015-01-10 20:33:392015-01-10 20:52:5813.00-73.97117640.7624281N-74.00418140.742653215.00.50.50.000.00.316.30
\n", "
" ], "text/plain": [ " VendorID tpep_pickup_datetime tpep_dropoff_datetime passenger_count \\\n", "0 2 2015-01-15 19:05:39 2015-01-15 19:23:42 1 \n", "1 1 2015-01-10 20:33:38 2015-01-10 20:53:28 1 \n", "2 1 2015-01-10 20:33:38 2015-01-10 20:43:41 1 \n", "3 1 2015-01-10 20:33:39 2015-01-10 20:35:31 1 \n", "4 1 2015-01-10 20:33:39 2015-01-10 20:52:58 1 \n", "\n", " trip_distance pickup_longitude pickup_latitude RateCodeID \\\n", "0 1.59 -73.993896 40.750111 1 \n", "1 3.30 -74.001648 40.724243 1 \n", "2 1.80 -73.963341 40.802788 1 \n", "3 0.50 -74.009087 40.713818 1 \n", "4 3.00 -73.971176 40.762428 1 \n", "\n", " store_and_fwd_flag dropoff_longitude dropoff_latitude payment_type \\\n", "0 N -73.974785 40.750618 1 \n", "1 N -73.994415 40.759109 1 \n", "2 N -73.951820 40.824413 2 \n", "3 N -74.004326 40.719986 2 \n", "4 N -74.004181 40.742653 2 \n", "\n", " fare_amount extra mta_tax tip_amount tolls_amount \\\n", "0 12.0 1.0 0.5 3.25 0.0 \n", "1 14.5 0.5 0.5 2.00 0.0 \n", "2 9.5 0.5 0.5 0.00 0.0 \n", "3 3.5 0.5 0.5 0.00 0.0 \n", "4 15.0 0.5 0.5 0.00 0.0 \n", "\n", " improvement_surcharge total_amount \n", "0 0.3 17.05 \n", "1 0.3 17.80 \n", "2 0.3 10.80 \n", "3 0.3 4.80 \n", "4 0.3 16.30 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "month.head()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12748986" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(month)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are about 12.7 million (1 crore and 27 lakh) data points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Features in the dataset:\n", "\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\n", "\t\n", "
Field NameDescription
VendorID\n", "\t\tA code indicating the TPEP provider that provided the record.\n", "\t\t
    \n", "\t\t\t
  1. Creative Mobile Technologies
  2. \n", "\t\t\t
  3. VeriFone Inc.
  4. \n", "\t\t
\n", "\t\t
tpep_pickup_datetimeThe date and time when the meter was engaged.
tpep_dropoff_datetimeThe date and time when the meter was disengaged.
Passenger_countThe number of passengers in the vehicle. This is a driver-entered value.
Trip_distanceThe elapsed trip distance in miles reported by the taximeter.
Pickup_longitudeLongitude where the meter was engaged.
Pickup_latitudeLatitude where the meter was engaged.
RateCodeIDThe final rate code in effect at the end of the trip.\n", "\t\t
    \n", "\t\t\t
  1. Standard rate
  2. \n", "\t\t\t
  3. JFK
  4. \n", "\t\t\t
  5. Newark
  6. \n", "\t\t\t
  7. Nassau or Westchester
  8. \n", "\t\t\t
  9. Negotiated fare
  10. \n", "\t\t\t
  11. Group ride
  12. \n", "\t\t
\n", "\t\t
Store_and_fwd_flagThis flag indicates whether the trip record was held in vehicle memory before sending to the vendor, aka “store and forward,” because the vehicle did not have a connection to the server.\n", "\t\tY= store and forward trip\n", "\t\tN= not a store and forward trip\n", "\t\t
Dropoff_longitudeLongitude where the meter was disengaged.
Dropoff_ latitudeLatitude where the meter was disengaged.
Payment_typeA numeric code signifying how the passenger paid for the trip.\n", "\t\t
    \n", "\t\t\t
  1. Credit card
  2. \n", "\t\t\t
  3. Cash
  4. \n", "\t\t\t
  5. No charge
  6. \n", "\t\t\t
  7. Dispute
  8. \n", "\t\t\t
  9. Unknown
  10. \n", "\t\t\t
  11. Voided trip
  12. \n", "\t\t
\n", "\t\t
Fare_amountThe time-and-distance fare calculated by the meter.
ExtraMiscellaneous extras and surcharges. Currently, this only includes. the $0.50 and $1 rush hour and overnight charges.
MTA_tax0.50 MTA tax that is automatically triggered based on the metered rate in use.
Improvement_surcharge0.30 improvement surcharge assessed trips at the flag drop. the improvement surcharge began being levied in 2015.
Tip_amountTip amount – This field is automatically populated for credit card tips.Cash tips are not included.
Tolls_amountTotal amount of all tolls paid in trip.
Total_amountThe total amount charged to passengers. Does not include cash tips.
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# ML Problem Formulation\n", "

Time-series forecasting and Regression

\n", "
\n", "- To find number of pickups, given location cordinates(latitude and longitude) and time, in the query reigion and surrounding regions.\n", "

\n", "To solve the above we would be using data collected in Jan 2015 to predict the pickups in Jan 2016.\n", "

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Performance metrics\n", "1. Mean Absolute percentage error.\n", "2. Mean Squared error." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## EDA and Data Cleaning\n", "\n", "In this section we will be doing univariate analysis and removing outlier/illegitimate values which may be caused due to some error" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distancepickup_longitudepickup_latitudeRateCodeIDstore_and_fwd_flagdropoff_longitudedropoff_latitudepayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amount
022015-01-15 19:05:392015-01-15 19:23:4211.59-73.99389640.7501111N-73.97478540.750618112.01.00.53.250.000.317.05
112015-01-10 20:33:382015-01-10 20:53:2813.30-74.00164840.7242431N-73.99441540.759109114.50.50.52.000.000.317.80
212015-01-10 20:33:382015-01-10 20:43:4111.80-73.96334140.8027881N-73.95182040.82441329.50.50.50.000.000.310.80
312015-01-10 20:33:392015-01-10 20:35:3110.50-74.00908740.7138181N-74.00432640.71998623.50.50.50.000.000.34.80
412015-01-10 20:33:392015-01-10 20:52:5813.00-73.97117640.7624281N-74.00418140.742653215.00.50.50.000.000.316.30
512015-01-10 20:33:392015-01-10 20:53:5219.00-73.87437440.7740481N-73.98697740.758194127.00.50.56.705.330.340.33
612015-01-10 20:33:392015-01-10 20:58:3112.20-73.98327640.7260091N-73.99247040.749634214.00.50.50.000.000.315.30
712015-01-10 20:33:392015-01-10 20:42:2030.80-74.00266340.7341421N-73.99501040.72632617.00.50.51.660.000.39.96
812015-01-10 20:33:392015-01-10 21:11:35318.20-73.78304340.6443562N-73.98759540.759357252.00.00.50.005.330.358.13
912015-01-10 20:33:402015-01-10 20:40:4420.90-73.98558840.7679481N-73.98591640.75936516.50.50.51.550.000.39.35
\n", "
" ], "text/plain": [ " VendorID tpep_pickup_datetime tpep_dropoff_datetime passenger_count \\\n", "0 2 2015-01-15 19:05:39 2015-01-15 19:23:42 1 \n", "1 1 2015-01-10 20:33:38 2015-01-10 20:53:28 1 \n", "2 1 2015-01-10 20:33:38 2015-01-10 20:43:41 1 \n", "3 1 2015-01-10 20:33:39 2015-01-10 20:35:31 1 \n", "4 1 2015-01-10 20:33:39 2015-01-10 20:52:58 1 \n", "5 1 2015-01-10 20:33:39 2015-01-10 20:53:52 1 \n", "6 1 2015-01-10 20:33:39 2015-01-10 20:58:31 1 \n", "7 1 2015-01-10 20:33:39 2015-01-10 20:42:20 3 \n", "8 1 2015-01-10 20:33:39 2015-01-10 21:11:35 3 \n", "9 1 2015-01-10 20:33:40 2015-01-10 20:40:44 2 \n", "\n", " trip_distance pickup_longitude pickup_latitude RateCodeID \\\n", "0 1.59 -73.993896 40.750111 1 \n", "1 3.30 -74.001648 40.724243 1 \n", "2 1.80 -73.963341 40.802788 1 \n", "3 0.50 -74.009087 40.713818 1 \n", "4 3.00 -73.971176 40.762428 1 \n", "5 9.00 -73.874374 40.774048 1 \n", "6 2.20 -73.983276 40.726009 1 \n", "7 0.80 -74.002663 40.734142 1 \n", "8 18.20 -73.783043 40.644356 2 \n", "9 0.90 -73.985588 40.767948 1 \n", "\n", " store_and_fwd_flag dropoff_longitude dropoff_latitude payment_type \\\n", "0 N -73.974785 40.750618 1 \n", "1 N -73.994415 40.759109 1 \n", "2 N -73.951820 40.824413 2 \n", "3 N -74.004326 40.719986 2 \n", "4 N -74.004181 40.742653 2 \n", "5 N -73.986977 40.758194 1 \n", "6 N -73.992470 40.749634 2 \n", "7 N -73.995010 40.726326 1 \n", "8 N -73.987595 40.759357 2 \n", "9 N -73.985916 40.759365 1 \n", "\n", " fare_amount extra mta_tax tip_amount tolls_amount \\\n", "0 12.0 1.0 0.5 3.25 0.00 \n", "1 14.5 0.5 0.5 2.00 0.00 \n", "2 9.5 0.5 0.5 0.00 0.00 \n", "3 3.5 0.5 0.5 0.00 0.00 \n", "4 15.0 0.5 0.5 0.00 0.00 \n", "5 27.0 0.5 0.5 6.70 5.33 \n", "6 14.0 0.5 0.5 0.00 0.00 \n", "7 7.0 0.5 0.5 1.66 0.00 \n", "8 52.0 0.0 0.5 0.00 5.33 \n", "9 6.5 0.5 0.5 1.55 0.00 \n", "\n", " improvement_surcharge total_amount \n", "0 0.3 17.05 \n", "1 0.3 17.80 \n", "2 0.3 10.80 \n", "3 0.3 4.80 \n", "4 0.3 16.30 \n", "5 0.3 40.33 \n", "6 0.3 15.30 \n", "7 0.3 9.96 \n", "8 0.3 58.13 \n", "9 0.3 9.35 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#table below shows few datapoints along with all our features\n", "month.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Pickup Latitude and Pickup Longitude" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is inferred from the source https://www.flickr.com/places/info/2459115 that New York is bounded by the location cordinates(lat,long) - (40.5774, -74.15) & (40.9176,-73.7004) so hence any cordinates not within these cordinates are not considered by us as we are only concerned with pickups which originate within New York.\n", "\n", "#### Plotting pickup cordinates which are outside the bounding box of New-York" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# we will collect all the points outside the bounding box of newyork city to outlier_locations\n", "outlier_locations = month[((month.pickup_longitude <= -74.15) | (month.pickup_latitude <= 40.5774)| \\\n", " (month.pickup_longitude >= -73.7004) | (month.pickup_latitude >= 40.9176))]" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "247742" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(outlier_locations)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 247k datapoints which lie outside the bounding box of New York" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **Creating a map with the a base location**\n", "* Read more about the folium here: **http://folium.readthedocs.io/en/latest/quickstart.html**\n", "* **Note:** you dont need to remember any of these, you dont need in depth knowledge on these maps and plots" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Below are the approximate latitude and longitude of Central New York\n", "map_osm = folium.Map(location=[40.734695, -73.990372], tiles='Stamen Toner')\n", "\n", "# we will spot only first 10000 outliers on the map, plotting all the outliers will take more time\n", "sample_locations = outlier_locations.head(10000)\n", "for i,j in sample_locations.iterrows():\n", " if int(j['pickup_latitude']) != 0:\n", " folium.Marker(list((j['pickup_latitude'],j['pickup_longitude']))).add_to(map_osm)\n", "map_osm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Observation:- As you can see above that there are some points just outside the boundary but there are a few that are in either South america, Mexico or Canada" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Dropoff Latitude & Dropoff Longitude\n", "#### We do the similar analysis for dropoff points" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outlier_locations = month[((month.dropoff_longitude <= -74.15) | (month.dropoff_latitude <= 40.5774)| \\\n", " (month.dropoff_longitude >= -73.7004) | (month.dropoff_latitude >= 40.9176))]\n", "\n", "map_osm = folium.Map(location=[40.734695, -73.990372], tiles='Stamen Toner')\n", "\n", "# we will spot only first 10000 outliers on the map, plotting all the outliers will take more time\n", "sample_locations = outlier_locations.head(10000)\n", "for i,j in sample_locations.iterrows():\n", " if int(j['pickup_latitude']) != 0:\n", " folium.Marker(list((j['dropoff_latitude'],j['dropoff_longitude']))).add_to(map_osm)\n", "map_osm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Observation:- The observations here are similar to those obtained while analysing pickup latitude and longitude" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. Trip Durations:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

According to NYC Taxi & Limousine Commision Regulations the maximum allowed trip duration in a 24 hour interval is 12 hours.

" ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# The timestamps are converted to unix so as to get duration(trip-time) & speed\n", "# Also pickup-times in unix are used while binning\n", "\n", "# in our data we have time in the format \"YYYY-MM-DD HH:MM:SS\"\n", "# we convert this string to python time format and then into unix time stamp\n", "# https://stackoverflow.com/a/27914405\n", "def convert_to_unix(s):\n", " return time.mktime(datetime.datetime.strptime(s, \"%Y-%m-%d %H:%M:%S\").timetuple())\n", "\n", "\n", "\n", "# we return a data frame which contains the columns\n", "# 1.'passenger_count' : self explanatory\n", "# 2.'trip_distance' : self explanatory\n", "# 3.'pickup_longitude' : self explanatory\n", "# 4.'pickup_latitude' : self explanatory\n", "# 5.'dropoff_longitude' : self explanatory\n", "# 6.'dropoff_latitude' : self explanatory\n", "# 7.'total_amount' : total fair that was paid\n", "# 8.'trip_times' : duration of each trip\n", "# 9.'pickup_times : pickup time converted into unix time \n", "# 10.'Speed' : velocity of each trip\n", "def return_with_trip_times(month):\n", " duration = month[['tpep_pickup_datetime','tpep_dropoff_datetime']].compute()\n", " # pickups and dropoffs to unix time\n", " duration_pickup = [convert_to_unix(x) for x in duration['tpep_pickup_datetime'].values]\n", " duration_drop = [convert_to_unix(x) for x in duration['tpep_dropoff_datetime'].values]\n", " # calculate duration of trips in minutes\n", " durations = (np.array(duration_drop) - np.array(duration_pickup))/float(60)\n", "\n", " new_frame = month[['passenger_count','trip_distance','pickup_longitude','pickup_latitude','dropoff_longitude','dropoff_latitude','total_amount']].compute()\n", " \n", " # append durations of trips and speed in miles/hr to a new dataframe\n", " new_frame['trip_times'] = durations\n", " new_frame['pickup_times'] = duration_pickup ## Used for time binning later\n", " new_frame['Speed'] = 60*(new_frame['trip_distance']/new_frame['trip_times'])\n", " \n", " return new_frame\n", "\n", "# print(frame_with_durations.head())\n", "# passenger_count\ttrip_distance\tpickup_longitude\tpickup_latitude\tdropoff_longitude\tdropoff_latitude\ttotal_amount\ttrip_times\tpickup_times\tSpeed\n", "# 1 1.59\t -73.993896 \t40.750111 \t-73.974785 \t40.750618 \t17.05 \t 18.050000\t1.421329e+09\t5.285319\n", "# 1 \t3.30 \t-74.001648 \t40.724243 \t-73.994415 \t40.759109 \t17.80 \t19.833333\t1.420902e+09\t9.983193\n", "# 1 \t1.80 \t-73.963341 \t40.802788 \t-73.951820 \t40.824413 \t10.80 \t10.050000\t1.420902e+09\t10.746269\n", "# 1 \t0.50 \t-74.009087 \t40.713818 \t-74.004326 \t40.719986 \t4.80 \t1.866667\t1.420902e+09\t16.071429\n", "# 1 \t3.00 \t-73.971176 \t40.762428 \t-74.004181 \t40.742653 \t16.30 \t19.316667\t1.420902e+09\t9.318378\n", "frame_with_durations = return_with_trip_times(month)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Save the dataframe as pickle file for future refrences" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "frame_with_durations.to_pickle(\"Save/frame_with_durations\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "frame_with_durations = pd.read_pickle(\"Save/frame_with_durations\")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "type(frame_with_durations)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Outlier Detection - \"trip_times\"" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZwAAADuCAYAAAAN3LFHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEypJREFUeJzt3X+MXtV95/H3Bzs0PxoKcQxCNqypcKuSbpuGKXE30S5L\nAjahG9OqkUC0HhIStylJWnWlLewfazZZaRPtbtMSJUSYIOyKLkFtWbxdYuqQkCoSSRg3TRxCI6aE\nlhEoGJsQWLSpjL/7x5yJHk/mx+Mfc+/M+P2SHj3P+d5zn3P8B/pw7j3PnVQVkiQttFP6noAk6eRg\n4EiSOmHgSJI6YeBIkjph4EiSOmHgSJI6YeBIkjph4EiSOmHgSJI6sbLvCSwmr3/962vdunV9T0OS\nlpS9e/c+W1Wr5+tn4AxYt24dY2NjfU9DkpaUJP84TD8vqUmSOmHgSJI6YeBIkjph4EiSOmHgSIvc\ntddey8UXX8x73/vevqciHRcDR1rknnjiCQDGx8f7nYh0nAwcaRG79tprj2i7ytFSZuBIi9jU6maK\nqxwtZQaOJKkTBo4kqRMGjrSITX+23/nnn9/PRKQTwMCRFrE77rjjiPZtt93Wz0SkE8DAkRa5qVWO\nqxstdT4tWlrkpq9ypKXKFY4kqRMGjiSpEwaOJKkTBo4kqRMGjiSpEwaOJKkTBo4kqRMGjiSpEwse\nOEmeSLIvyd8lGWu11yXZk+Sx9n5GqyfJzUnGk3wzyZsGvme09X8syehA/cL2/ePt3Mw1hiSpH12t\ncP5tVb2xqkZa+wbggapaDzzQ2gCXA+vbaytwC0yGB7ANeDNwEbBtIEBuaX2nzts0zxiSpB70dUlt\nM7Cjfd4BXDlQ31mTvgKcnuRsYCOwp6oOVtVzwB5gUzt2WlU9VFUF7Jz2XTONIUnqQReBU8BfJ9mb\nZGurnVVVTwO09zNbfQ3w5MC5E602V31ihvpcYxwhydYkY0nG9u/ff4z/REnSfLp4eOdbquqpJGcC\ne5L8/Rx9M0OtjqE+tKq6FbgVYGRk5KjOlSQNb8FXOFX1VHt/BriHyXsw32uXw2jvz7TuE8A5A6ev\nBZ6ap752hjpzjCFJ6sGCBk6S1yR57dRn4DLgW8AuYGqn2Shwb/u8C9jSdqttAJ5vl8PuBy5Lckbb\nLHAZcH879kKSDW132pZp3zXTGJKkHiz0JbWzgHvaTuWVwJ9V1e4kDwN3J7kO+CfgXa3/fcA7gHHg\nJeDdAFV1MMlHgIdbvw9X1cH2+f3AHcCrgM+1F8BHZxlDktSDTG7uEkzewxkbG+t7GpK0pCTZO/Cz\nl1n5pAFJUicMHElSJwwcSVInDBxJUicMHElSJwwcSVInDBxJUicMHElSJwwcSVInDBxJUicMHElS\nJwwcSVInDBxJUicMHElSJwwcSVInDBxJUicMHElSJwwcSVInDBxJUicMHElSJwwcSVInDBxJUicM\nHElSJwwcSVInDBxJUicMHElSJwwcSVInOgmcJCuSfD3JX7X2eUm+muSxJJ9Ncmqr/0Rrj7fj6wa+\n48ZW/06SjQP1Ta02nuSGgfqMY0iS+tHVCuf3gEcH2h8DPl5V64HngOta/Trguao6H/h460eSC4Cr\ngDcAm4BPtRBbAXwSuBy4ALi69Z1rDElSDxY8cJKsBa4AbmvtAJcAf9667ACubJ83tzbt+Nta/83A\nXVX1w6r6LjAOXNRe41X1eFX9M3AXsHmeMSRJPehihfPHwH8ADrf2KuD7VXWotSeANe3zGuBJgHb8\n+db/R/Vp58xWn2uMIyTZmmQsydj+/fuP9d8oSZrHggZOkl8FnqmqvYPlGbrWPMdOVP3Hi1W3VtVI\nVY2sXr16pi6SpBNg5QJ//1uAdyZ5B/BK4DQmVzynJ1nZViBrgada/wngHGAiyUrgp4CDA/Upg+fM\nVH92jjEkST1Y0BVOVd1YVWurah2TN/2/UFXXAF8EfqN1GwXubZ93tTbt+Beqqlr9qraL7TxgPfA1\n4GFgfduRdmobY1c7Z7YxJEk96Ot3OH8I/EGScSbvt3ym1T8DrGr1PwBuAKiqR4C7gW8Du4Hrq+rl\ntnr5AHA/k7vg7m595xpDktSDTC4GBDAyMlJjY2N9T0OSlpQke6tqZL5+PmlAktQJA0eS1AkDR5LU\nCQNHktQJA0eS1AkDR5LUCQNHktQJA0eS1AkDR5LUCQNHktQJA0eS1AkDR5LUCQNHktQJA0eS1AkD\nR5LUCQNHktQJA0eS1ImhAifJa5Kc0j7/TJJ3JnnFwk5NkrScDLvC+RvglUnWAA8A7wbuWKhJSZKW\nn2EDJ1X1EvDrwCeq6teACxZuWpKk5WbowEnyK8A1wP9ptZULMyVJ0nI0bOD8PnAjcE9VPZLkp4Ev\nLty0JEnLzVCrlKr6EvClJK9p7ceBDy3kxCRJy8uwu9R+Jcm3gUdb+xeTfGpBZyZJWlaGvaT2x8BG\n4ABAVX0D+NcLNSlJ0vIz9A8/q+rJaaWXT/BcJEnL2LA7zZ5M8q+ASnIqk/dvHl24aUmSlpthVzi/\nA1wPrAEmgDe29pySvDLJ15J8I8kjSf5zq5+X5KtJHkvy2RZiJPmJ1h5vx9cNfNeNrf6dJBsH6pta\nbTzJDQP1GceQJPVjqMCpqmer6pqqOquqzqyq36yqA0Oc+kPgkqr6RSZDalOSDcDHgI9X1XrgOeC6\n1v864LmqOh/4eOtHkguAq4A3AJuATyVZkWQF8EngciZ/iHp168scY0iSejDsLrXzkvxRkr9Msmvq\nNd95NenF1nxFexVwCfDnrb4DuLJ93tzatONvS5JWv6uqflhV3wXGgYvaa7yqHq+qfwbuAja3c2Yb\nQ5LUg2Hv4fwv4DPA/wYOH80AbRWyFzifydXIPwDfr6pDrcsEk5fqaO9PAlTVoSTPA6ta/SsDXzt4\nzpPT6m9u58w2xvT5bQW2Apx77rlH80+TJB2FYQPn/1XVzccyQFW9DLwxyenAPcDPzdStvWeWY7PV\nZ1qhzdV/pvndCtwKMDIyMmMfSdLxGzZw/iTJNuCvmbwvA0BV/e2wA1XV95M8CGwATk+ysq1A1gJP\ntW4TwDnARJKVwE8BBwfqUwbPman+7BxjSJJ6MOwutX8JvA/4KPA/2uu/z3dSktVtZUOSVwFvZ3I7\n9ReB32jdRoF72+ddrU07/oWqqla/qu1iOw9YD3wNeBhY3+4xncrkxoJd7ZzZxpAk9WDYFc6vAT/d\nbswfjbOBHe0+zinA3VX1V+0xOXcl+S/A15m8P0R7/9Mk40yubK4CaA8MvRv4NnAIuL5dqiPJB4D7\ngRXA7VX1SPuuP5xlDElSDzK5GJinU/JZ4INV9czCT6k/IyMjNTY21vc0JGlJSbK3qkbm6zfsCucs\n4O+TPMyR93DeeYzzkySdZIYNnG0LOgtJs9q+fTt33nknW7Zs4T3veU/f05GO2VCX1E4WXlLTYnTx\nxRf/6PODDz7Y2zyk2Qx7SW3OXWpJvtzeX0jyg4HXC0l+cKImK2lm27dvP6J9++239zQT6fjNGThV\n9db2/tqqOm3g9dqqOq2bKUonrzvvvPOI9s6dO3uaiXT8hn2W2p8OU5MkaTbD/vDzDYON9hSAC0/8\ndCRJy9V893BuTPIC8AuD92+A7+Ev96UFd8011xzR3rJlS08zkY7ffPdw/mtVvRb4b9Pu36yqqhun\n+iV5wxxfI+kYve997zui7bZoLWXD/gG2G+fp4v0caYFMrXJc3WipOyG/w0ny9ar6pRMwn175OxxJ\nOnon5Hc4R8Ffj0qS5nSiAkeSpDmdqMA52j9bIEk6yQz78E6S/DrwViYvn325qu6ZOlZVGxZgbpKk\nZWTYJw18CvgdYB/wLeC3k3xyIScmSVpehl3h/Bvg59ufbibJDibDR5KkoQx7D+c7wLkD7XOAb574\n6UiSlqthVzirgEeTfK21fxl4KMku8C9/SpLmN2zg/KcFnYUkadkbKnCq6ksLPRFJ0vI2Z+Ak+XJV\nvbU9IXrwaQIByj/CJkka1pyBM/gXP7uZjiRpuZp3l1qSU5J8q4vJSJKWr3kDp6oOA99Icu58fSVJ\nms2wu9TOBh5p26L/71TR7dCSpGENGzg/CfzqQDvAx078dCRJy9WwTxpYWVVfGng9CLxqvpOSnJPk\ni0keTfJIkt9r9dcl2ZPksfZ+Rqsnyc1JxpN8M8mbBr5rtPV/LMnoQP3CJPvaOTcnyVxjSJL6MWfg\nJHl/kn3Az7YAmHp9l+EebXMI+PdV9XPABuD6JBcANwAPVNV64IHWBrgcWN9eW4Fb2jxeB2wD3gxc\nBGwbCJBbWt+p8za1+mxjSJJ6MN8K58+Afwfsau9Trwur6jfn+/Kqerqq/rZ9fgF4FFgDbAZ2tG47\ngCvb583Azpr0FeD0JGcDG4E9VXWwqp4D9gCb2rHTquqh9mDRndO+a6YxJEk9mO93OM8DzwNXH+9A\nSdYBvwR8FTirqp5uYzyd5MzWbQ3w5MBpE602V31ihjpzjDF9XluZXCFx7rluxJOkhdLJn5hO8pPA\nXwC/X1U/mKvrDLU6hvrQqurWqhqpqpHVq1cfzamSpKOw4IGT5BVMhs2dVfWXrfy9djmM9v5Mq08w\n+acPpqwFnpqnvnaG+lxjSJJ6sKCB03aMfQZ4tKr+aODQLmBqp9kocO9AfUvbrbYBeL5dFrsfuCzJ\nGW2zwGXA/e3YC0k2tLG2TPuumcaQJPVg2N/hHKu3AL8F7Evyd632H4GPAncnuQ74J+Bd7dh9wDuA\nceAl4N0AVXUwyUeAh1u/D1fVwfb5/cAdTG7T/lx7MccYkqQepP3VaAEjIyM1NjbW9zQkaUlJsreq\nRubr18mmAUmSDBxJUicMHElSJwwcSVInDBxJUicMHElSJwwcSVInDBxJUicMHElSJwwcSVInDBxJ\nUicMHElSJwwcSVInDBxJUicMHElSJwwcSVInDBxJUicMHElSJwwcSVInDBxJUicMHElSJwwcSVIn\nDBxJUicMHElSJwwcSVInDBxJUicMHElSJxY0cJLcnuSZJN8aqL0uyZ4kj7X3M1o9SW5OMp7km0ne\nNHDOaOv/WJLRgfqFSfa1c25OkrnGkCT1Z6FXOHcAm6bVbgAeqKr1wAOtDXA5sL69tgK3wGR4ANuA\nNwMXAdsGAuSW1nfqvE3zjCFJ6smCBk5V/Q1wcFp5M7Cjfd4BXDlQ31mTvgKcnuRsYCOwp6oOVtVz\nwB5gUzt2WlU9VFUF7Jz2XTONIUnqSR/3cM6qqqcB2vuZrb4GeHKg30SrzVWfmKE+1xg/JsnWJGNJ\nxvbv33/M/yhJ0twW06aBzFCrY6gflaq6tapGqmpk9erVR3u6tODGxsa45JJL2Lt3b99TkY5LH4Hz\nvXY5jPb+TKtPAOcM9FsLPDVPfe0M9bnGkJacm266icOHD7Nt27a+pyIdlz4CZxcwtdNsFLh3oL6l\n7VbbADzfLofdD1yW5Iy2WeAy4P527IUkG9rutC3TvmumMaQlZWxsjBdffBGAF1980VWOlrSF3hb9\nP4GHgJ9NMpHkOuCjwKVJHgMubW2A+4DHgXFgO/C7AFV1EPgI8HB7fbjVAN4P3NbO+Qfgc60+2xjS\nknLTTTcd0XaVo6Vs5UJ+eVVdPcuht83Qt4DrZ/me24HbZ6iPAT8/Q/3ATGNIS83U6ma2trSULKZN\nA5KmWbly5ZxtaSkxcKRFbMWKFXO2paXEwJEWsY0bNx7R3rRp+oM7pKXDwJEWsdHR0R+talauXMmW\nLVt6npF07AwcaRFbtWoVV1xxBUm44oorWLVqVd9Tko6ZdyClRW50dJQnnnjC1Y2WPANHWuRWrVrF\nzTff3Pc0pOPmJTVJUicMHElSJwwcSVInDBxpkTtw4AAf+tCHOHDgQN9TkY6LgSMtcjt27GDfvn3s\n3Lmz76lIx8XAkRaxAwcOsHv3bqqK3bt3u8rRkmbgSIvYjh07OHz4MAAvv/yyqxwtaQaOtIh9/vOf\n59ChQwAcOnSIPXv29Dwj6dgZONIi9va3v51TTpn8z/SUU07h0ksv7XlG0rEzcKRFbHR09EeX1A4f\nPuzjbbSkGTjSIrZ79+4j2l5S01Jm4EiL2Pbt249of/rTn+5pJtLxM3AkSZ0wcCRJnTBwJEmdMHAk\nSZ0wcCRJnTBwJEmdMHAkSZ0wcCRJnVjWgZNkU5LvJBlPckPf85Gkk9myDZwkK4BPApcDFwBXJ7mg\n31lJ0slrZd8TWEAXAeNV9ThAkruAzcC3e53VHD7xiU/82LOzTlYvvfQSVdX3NBaliy++uO8p9CoJ\nr371q/uexqKwadMmPvjBD/Y9jaEt2xUOsAZ4cqA90WpHSLI1yViSsf3793c2OUk62WS5/l9kkncB\nG6vqva39W8BFVTXr/w6MjIzU2NhYV1OU5jXTaubBBx/sfB7SXJLsraqR+fot5xXOBHDOQHst8FRP\nc5Gkk95yDpyHgfVJzktyKnAVsKvnOUlHZfpqxtWNlrJlu2mgqg4l+QBwP7ACuL2qHul5WpJ00lq2\ngQNQVfcB9/U9D+l4uKrRcrGcL6lJkhYRA0eS1AkDR5LUCQNHktSJZfvDz2ORZD/wj33PQ5rB64Fn\n+56ENIt/UVWr5+tk4EhLQJKxYX7JLS1mXlKTJHXCwJEkdcLAkZaGW/uegHS8vIcjSeqEKxxJUicM\nHElSJwwcSVInDBxJUicMHElSJ/4/5iL0/VLIM0cAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# the skewed box plot shows us the presence of outliers \n", "sns.boxplot(y=\"trip_times\", data = frame_with_durations)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observations:**\n", "* Due to presence of extreme outliers, its difficult to make sense of the box plot\n", "* We can instead try to check percentiles.\n", "\n", "Sorting the \"trip_times\" and calculating percentiles" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true }, "outputs": [], "source": [ "var = frame_with_durations[\"trip_times\"].values\n", "var = np.sort(var)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 percentile value is -1211.0166666666667\n", "10 percentile value is 3.8333333333333335\n", "20 percentile value is 5.383333333333334\n", "30 percentile value is 6.816666666666666\n", "40 percentile value is 8.3\n", "50 percentile value is 9.95\n", "60 percentile value is 11.866666666666667\n", "70 percentile value is 14.283333333333333\n", "80 percentile value is 17.633333333333333\n", "90 percentile value is 23.45\n", "100 percentile value is 548555.633333\n" ] } ], "source": [ "#calculating 0-100th percentile to find a the correct percentile value for removal of outliers\n", "for i in range(0,100,10):\n", " print(\"{} percentile value is {}\".format(i,var[int(len(var)*(i/100))]))\n", "print (\"100 percentile value is \",var[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observations:**\n", "* There are some trip times which are negative and must be discarded\n", "* On the contrary some trip times are 548555 minutes (i.e. 381 days) which is clearly not in the TLC regulations" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "90 percentile value is 23.45\n", "91 percentile value is 24.35\n", "92 percentile value is 25.383333333333333\n", "93 percentile value is 26.55\n", "94 percentile value is 27.933333333333334\n", "95 percentile value is 29.583333333333332\n", "96 percentile value is 31.683333333333334\n", "97 percentile value is 34.46666666666667\n", "98 percentile value is 38.71666666666667\n", "99 percentile value is 46.75\n", "100 percentile value is 548555.633333\n" ] } ], "source": [ "# Zooming in from the 90th percecntile to 100\n", "for i in range(90,100):\n", " print(\"{} percentile value is {}\".format(i,var[int(len(var)*(i/100))]))\n", "print (\"100 percentile value is \",var[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Removing data based on our analysis and TLC regulations**\n", "* Retaining only positive time_trips and discarding all the negative ones.\n", "* Also retaining only those time trips which are less than 12 hrs (i.e. 720 minutes)" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": true }, "outputs": [], "source": [ "frame_with_durations_modified = frame_with_durations[(frame_with_durations.trip_times>1) & (frame_with_durations.trip_times<720)]" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAADuCAYAAADMW/vrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAEVdJREFUeJzt3X2MZfVdx/H3B7ZYtNSFMiDZB7fNrs8KxUnd2sZi8aGg\nYamRhFplQzYZHxBpolHwD6uJiTUm1m5Fkk1Rl6ZPtBZZKz6QbdHUhNrBAgW2hpFYdliEgbZbdPsQ\n6Nc/5jflsjtn5u4yZ+7d3fcruTnnfO/v3Pud/WM/+Z1z7jmpKiRJWswpo25AkjS+DAlJUidDQpLU\nyZCQJHUyJCRJnQwJSVInQ0KS1MmQkCR1MiQkSZ3WjLqBF+vss8+uTZs2jboNSTqu3HPPPU9V1cRy\n4477kNi0aRPT09OjbkOSjitJPj/MOA83SZI6GRKSpE6GhCSpkyEhSepkSEg92LlzJxdddBE33njj\nqFuRXhRDQurBRz/6UQA+/OEPj7gT6cUxJKQVtnPnzhdsO5vQ8cyQkFbYwixigbMJHc8MCUlSJ0NC\nktSp15BI8t1J7h14fTnJ25KcleTOJA+35ZltfJLsTDKT5P4kF/bZnyRpab2GRFX9Z1VdUFUXAD8M\nHAJuA64H9lbVFmBv2wa4BNjSXlPATX32J0la2moebroY+K+q+jywDdjd6ruBy9v6NuCWmnc3sDbJ\neavYoyRpwGqGxJXAB9r6uVX1OEBbntPq64D9A/vMttoLJJlKMp1kem5urseWJenktiohkeQ04DJg\nuWsBs0itjihU7aqqyaqanJhY9nbokqRjtFoziUuA/6iqJ9r2EwuHkdryyVafBTYM7LceOLBKPUqS\nDrNaIfEWnj/UBLAH2N7WtwO3D9Svalc5bQUOLhyWkiStvt6fTJfkW4GfBH55oPwO4NYkO4BHgSta\n/Q7gUmCG+Suhru67P0lSt95DoqoOAa84rPY081c7HT62gGv67kmSNBx/cS1J6mRISJI6GRKSpE6G\nhCSpkyEhSepkSEiSOhkSkqROhoQkqZMhIUnqZEhIkjoZEpKkToaEJKmTISFJ6mRISJI6GRKSpE6G\nhCSpkyEhSepkSEiSOhkSkqROvYdEkrVJPpLkc0n2JXltkrOS3Jnk4bY8s41Nkp1JZpLcn+TCvvuT\nJHVbjZnEu4B/rKrvAc4H9gHXA3uraguwt20DXAJsaa8p4KZV6E+S1KHXkEjycuDHgJsBqurrVfUl\nYBuwuw3bDVze1rcBt9S8u4G1Sc7rs0dJUre+ZxKvAuaAv0rymSTvSfJtwLlV9ThAW57Txq8D9g/s\nP9tqL5BkKsl0kum5ubl+/wJJOon1HRJrgAuBm6rq1cD/8fyhpcVkkVodUajaVVWTVTU5MTGxMp1K\nko7Qd0jMArNV9am2/RHmQ+OJhcNIbfnkwPgNA/uvBw703KMkqUOvIVFV/wPsT/LdrXQx8BCwB9je\natuB29v6HuCqdpXTVuDgwmEpSdLqW7MK33Et8L4kpwGPAFczH063JtkBPApc0cbeAVwKzACH2lhJ\n0oj0HhJVdS8wuchbFy8ytoBr+u5JkjQcf3EtSepkSEiSOhkSkqROhoQkqZMhIUnqZEhIkjoZEpKk\nToaEJKmTISFJ6mRISJI6GRKSpE6GhCSpkyEhSepkSEiSOhkSkqROhoQkqZMhIUnqZEhIkjr1HhJJ\n/jvJZ5Pcm2S61c5KcmeSh9vyzFZPkp1JZpLcn+TCvvuTJHVbrZnEj1fVBVW18Kzr64G9VbUF2Nu2\nAS4BtrTXFHDTKvUnSVrEqA43bQN2t/XdwOUD9Vtq3t3A2iTnjaJBSdLqhEQB/5zkniRTrXZuVT0O\n0JbntPo6YP/AvrOt9gJJppJMJ5mem5vrsXVJOrmtWYXveF1VHUhyDnBnks8tMTaL1OqIQtUuYBfA\n5OTkEe9LklZG7zOJqjrQlk8CtwGvAZ5YOIzUlk+24bPAhoHd1wMH+u5RkrS4XkMiybclOWNhHfgp\n4AFgD7C9DdsO3N7W9wBXtauctgIHFw5LSZJWX9+Hm84Fbkuy8F3vr6p/TPJp4NYkO4BHgSva+DuA\nS4EZ4BBwdc/9SZKW0GtIVNUjwPmL1J8GLl6kXsA1ffYkSRqev7iWJHUyJCRJnQwJSVInQ0KS1MmQ\nkCR1MiQkSZ0MCUlSJ0NCktTJkJAkdTIkJEmdhgqJdqO+U9r6dyW5LMlL+m1NkjRqw84k/hV4aZJ1\nzD9u9Grgr/tqSpI0HoYNiVTVIeDngHdX1ZuB7+uvLUnSOBg6JJK8Fngr8PetthpPtZMkjdCwIfE2\n4Abgtqp6MMmrgE/015YkaRwMNRuoqn8B/qU9XW7hORG/0WdjkqTRG/bqptcmeQjY17bPT/IXvXYm\nSRq5YQ83/Rnw08DTAFV1H/BjfTUlSRoPQ/+Yrqr2H1Z6boV7kSSNmWFDYn+SHwUqyWlJfot26GkY\nSU5N8pkkH2vbr0zyqSQPJ/lQktNa/Vva9kx7f9NR/j2SpBU0bEj8CnANsA6YBS5o28O6jheGyh8D\n76yqLcAXgR2tvgP4YlVtBt7ZxkmSRmSokKiqp6rqrVV1blWdU1W/WFVPD7NvkvXAzwDvadsB3gh8\npA3ZDVze1re1bdr7F7fxkqQRGOoS2CSvBK4FNg3uU1WXDbH7nwG/DZzRtl8BfKmqnm3bs8zPUGjL\n/e2zn01ysI1/6rB+poApgI0bNw7zJ0iSjsGwv5r+W+Bm4O+Abwz74Ul+Fniyqu5JctFCeZGhNcR7\nzxeqdgG7ACYnJ494X5K0MoYNia9W1c5j+PzXAZcluRR4KfBy5mcWa5OsabOJ9cCBNn4W2ADMJlkD\nfDvwhWP4XknSChj2xPW7kry9/ajuwoXXcjtV1Q1Vtb6qNgFXAh+vqrcyf0uPn2/DtgO3t/U9bZv2\n/serypmCJI3IsDOJHwR+ifkTzguHm6ptH4vfAT6Y5A+BzzB/KIu2fG+SGeZnEFce4+dLklbAsCHx\nZuBVVfX1Y/2iqroLuKutPwK8ZpExXwWuONbvkCStrGEPN90HrO2zEUnS+Bl2JnEu8Lkknwa+tlAc\n8hJYSdJxatiQeHuvXUiSxtLRPE9CknSSWTIkknyyql6f5Ble+KO2AFVVL++1O0nSSC0ZElX1+rY8\nY6lxkqQT07BPpnvvMDVJ0oll2Etgv39wo90y44dXvh1J0jhZMiSS3NDOR/xQki+31zPAEzx/Kw1J\n0glqyZCoqj9q5yP+pKpe3l5nVNUrquqGhXFJvn+Jj5EkHaeGfejQDcsM8fyEJJ2Ahj0nsRyfHidJ\nJ6CVCglv5y1JJ6CVCglJ0glopULimG8hLkkaX8Pe4I8kPwe8nvlDS5+sqtsW3quqrT30JkkasWF/\ncf0XwK8AnwUeAH45yY19NiZJGr1hZxJvAH5g4XnTSXYzHxiSpBPYsOck/hPYOLC9Abh/uZ2SvDTJ\nvye5L8mDSf6g1V+Z5FNJHk7yoSSntfq3tO2Z9v6mo/tzJEkradiQeAWwL8ldSe4CHgImkuxJsmeJ\n/b4GvLGqzgcuAN6UZCvwx8A7q2oL8EVgRxu/A/hiVW0G3tnGSZJGZNjDTb93LB/eDk/9b9t8SXsV\n8EbgF1p9N/D7wE3AtrYO8BHgz5Nk4TCXJGl19f5kuiSnAvcAm4Ebgf8CvlRVz7Yhs8C6tr4O2N++\n89kkB5mfxTx12GdOAVMAGzcOHgWTJK2k5e4C+8m2fGbgLrBfXtge5guq6rmqugBYD7wG+N7Fhi18\n5RLvDX7mrqqarKrJiYmJYdqQJB2DVXsyXVV9qZ3P2AqsTbKmzSbWAwfasFnmT4rPtmdWfDvwhRf7\n3ZKkY7PsieskpyR54Fg+PMlEkrVt/XTgJ4B9wCeAn2/DtvP8syn2tG3a+x/3fIQkjc6y5ySq6hvt\nEtaNVfXoUX7+ecDudl7iFODWqvpYkoeADyb5Q+AzwM1t/M3Ae5PMMD+DuPIov0+StIKGvbrpPODB\nJP8O/N9CsaouW2qnqrofePUi9UeYPz9xeP2rwBVD9iRJ6tmwIfEy4GcHtoO/YZCkE96wIbHm8Mtg\n2zkGSdIJbMmQSPKrwK8Br0oyeBuOM4B/67MxSdLoLTeTeD/wD8AfAdcP1J+pKi9NlaQT3HK/kzgI\nHATesjrtSJLGiY8vlSR1MiQkSZ0MCUlSJ0NCktTJkJAkdTIkJEmdDAlJUidDQpLUyZCQJHUyJCRJ\nnQwJSVInQ0KS1MmQkCR1MiQkSZ16DYkkG5J8Ism+JA8mua7Vz0pyZ5KH2/LMVk+SnUlmktyf5MI+\n+5MkLa3vmcSzwG9W1fcCW4Frknwf8w8w2ltVW4C9PP9Ao0uALe01BdzUc3+SpCX0GhJV9XhV/Udb\nfwbYB6wDtgG727DdwOVtfRtwS827G1ib5Lw+e5QkdVu1cxJJNgGvBj4FnFtVj8N8kADntGHrgP0D\nu8222uGfNZVkOsn03Nxcn21L0kltVUIiycuAvwHeVlVfXmroIrU6olC1q6omq2pyYmJipdqUJB2m\n95BI8hLmA+J9VfXRVn5i4TBSWz7Z6rPAhoHd1wMH+u5RkrS4vq9uCnAzsK+q/nTgrT3A9ra+Hbh9\noH5Vu8ppK3Bw4bCUJGn1ren5818H/BLw2ST3ttrvAu8Abk2yA3gUuKK9dwdwKTADHAKu7rk/SdIS\neg2Jqvoki59nALh4kfEFXNNnT5Kk4fmLa0lSJ0NCktTJkJAkdTIkJEmdDAlJUidDQpLUyZCQJHUy\nJCRJnQwJSVInQ0KS1MmQkCR1MiQkSZ0MCUlSJ0NCktTJkJAkdTIkJEmdDAlJUidDQpLUqdeQSPKX\nSZ5M8sBA7awkdyZ5uC3PbPUk2ZlkJsn9SS7sszdJ0vL6nkn8NfCmw2rXA3uraguwt20DXAJsaa8p\n4Kaee5MkLaPXkKiqfwW+cFh5G7C7re8GLh+o31Lz7gbWJjmvz/4kSUsbxTmJc6vqcYC2PKfV1wH7\nB8bNttoRkkwlmU4yPTc312uzknQyG6cT11mkVosNrKpdVTVZVZMTExM9tyVJJ69RhMQTC4eR2vLJ\nVp8FNgyMWw8cWOXeJEkDRhESe4DtbX07cPtA/ap2ldNW4ODCYSlJ0mis6fPDk3wAuAg4O8ks8Hbg\nHcCtSXYAjwJXtOF3AJcCM8Ah4Oo+e5MkLa/XkKiqt3S8dfEiYwu4ps9+JElHZ5xOXEuSxowhIUnq\nZEhIkjoZEpKkToaEJKmTISFJ6mRISJI6GRKSpE6GhCSpkyEhSepkSEiSOhkSkqROvd7gTyeXd7/7\n3czMzIy6jbF03XXXjbqFkdq8eTPXXnvtqNvQMXAmIUnqlPk7dB+/Jicna3p6etRtSN900UUXHVG7\n6667Vr0PaSlJ7qmqyeXGOZOQJHVyJvEieRxei7nvvvu+uX7++eePsBONm3E5PzPsTMIT1y/SzMwM\n9z6wj+e+9axRt6IxcgoQ4DngnkeeGHE3GhenHvrCqFs4aobEi/TYY48Bx/dsTCvvG2d8x6hb0Fiq\n9n/G8WPsQiLJm4B3AacC76mqd4y4peU99yynHnp61F1onHzjufnlKaeOtg+Nl+eeHXUHR22sQiLJ\nqcCNwE8Cs8Cnk+ypqodG21m3N7zhDZ6TaB577DG+8pWvjLqNsbDw73D6S08bcSfj4fTTT2fdunWj\nbmMsbN68edQtHJWxCgngNcBMVT0CkOSDwDZgbENiHE5AjQtP4j9v4ZCC/zHOG5eTtTp64xYS64D9\nA9uzwI8cPijJFDAFsHHjxtXpTMvyPwHpxDNuv5PIIrUjzgpX1a6qmqyqyYmJiVVoS5JOTuMWErPA\nhoHt9cCBEfUiSSe9cQuJTwNbkrwyyWnAlcCeEfckSSetsTonUVXPJvl14J+YvwT2L6vqwRG3JUkn\nrbEKCYCqugO4Y9R9SJLG73CTJGmMGBKSpE6GhCSp03F/q/Akc8DnR92HtIizgadG3YTU4Turatkf\nmh33ISGNqyTTw9yvXxpnHm6SJHUyJCRJnQwJqT+7Rt2A9GJ5TkKS1MmZhCSpkyEhSepkSEiSOhkS\nkqROhoQkqdP/A/ItpKDerWjpAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# box-plot after removal of outliers\n", "sns.boxplot(y=\"trip_times\", data =frame_with_durations_modified)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observations:** Still it seems most of the trip_times are below 50 minutes. But we will stick to this" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Outlier Detection - Speed\n", "**Noe:** Check for any outliers in the data after removing trip duration outliers\n", "\n", "Calculate the \"Speed\" for the modified dataframe again (since trip_time outliers have been removed)" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "collapsed": true }, "outputs": [], "source": [ "frame_with_durations_modified['Speed'] = 60*(frame_with_durations_modified['trip_distance']/frame_with_durations_modified['trip_times'])" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAD2CAYAAADF97BZAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFfZJREFUeJzt3X+QXWd93/H3x5JtwFBiVhtwbQs5kafBBLDhRk4CNCKx\nzdqkmEzTiRUSLx0YFQJ2QmeacSaeODV0mpZJaU0hoBCNJRrb4UcoYsBrywFit2DqFTH4VwyLELUq\nBokVOAl2sVf+9o97RK/k1eog6ejuXb9fM3f2Ps95zr3f/UP67HN+PCdVhSRJh3PCsAuQJI0GA0OS\n1IqBIUlqxcCQJLViYEiSWjEwJEmtLLnASLIxye4k97YYuzLJZ5P8TZKvJLnkeNQoSaNoyQUGcD0w\n0XLs1cCHq+o84DLgfV0VJUmjbskFRlXdDuwd7Evyk0mmkmxLckeSn9o/HPhHzftnA7uOY6mSNFKW\nD7uA42QD8Oaq+lqS8+nPJH4R+EPg1iRXAKcAFwyvREla3JZ8YCR5JvDzwEeS7O8+ufm5Dri+qv44\nyc8BH0ry01X1xBBKlaRFbckHBv3Dbt+rqnPn2fZGmvMdVfWFJE8DVgC7j2N9kjQSltw5jINV1d8B\n30jyLwDS95Jm8/8GfqnpfwHwNGDPUAqVpEUuS2212iQ3AmvpzxS+DVwDfAb4E+A04ETgpqq6Nsk5\nwJ8Cz6R/Avx3q+rWYdQtSYvdkgsMSVI3lvwhKUnSsdFZYCQ5s7mL+oEk9yX57XnGJMl1SWaaO61f\nOrBtMsnXmtdkV3VKktrp7JBUktOA06rqS0meBWwDXldV9w+MuQS4ArgEOB/4L1V1fpLnANNAj/65\nhW3Ay6rquwt954oVK2rVqlWd/D6StBRt27btO1U13mZsZ5fVVtW3gG817/8+yQPA6cD9A8MuBTZX\nP7XuTPJjTdCsBbZW1V6AJFvpX/5640LfuWrVKqanp4/57yJJS1WSb7Yde1zOYSRZBZwHfPGgTacD\nDw20dzZ9h+qf77PXJ5lOMr1nj1fESlJXOg+M5k7rjwG/09wTccDmeXapBfqf3Fm1oap6VdUbH281\nq5IkHYFOAyPJifTD4s+r6i/nGbITOHOgfQb9BQAP1S9JGpIur5IK8GfAA1X1nw4xbAtweXO11M8C\nDzfnPm4BLkpyapJTgYuaPknSkHQ5w3g58JvALya5u3ldkuTNSd7cjPk0sB2YoX/H9W8BNCe73wHc\n1byu3X8CXBo1s7OzXHnllczOzg67FOmodHmV1P9g/nMRg2MKeOshtm0ENnZQmnRcbdq0iXvuuYfN\nmzfz9re/fdjlSEfMO72lDs3OzjI1NUVVMTU15SxDI83AkDq0adMmnnii/3iVffv2sXnz5iFXJB05\nA0Pq0G233cbc3BwAc3NzbN26dcgVSUfOwJA6dMEFF7B8ef9U4fLly7nwwguHXJF05AwMqUOTk5Oc\ncEL/n9myZcu4/PLLh1yRdOQMDKlDY2NjTExMkISJiQnGxsaGXZJ0xJ4Kz/SWhmpycpIdO3Y4u9DI\nMzCkjo2NjXHdddcNuwzpqHlISpLUioEhSWrFwJAktWJgSJJaMTAkSa0YGJKkVgwMSVIrBoYkqZXO\nbtxLshH4ZWB3Vf30PNv/DfD6gTpeAIxX1d4kO4C/B/YBc1XV66pOSVI7Xc4wrgcmDrWxqt5VVedW\n1bnA7wF/fdBjWF/VbDcsJGkR6Cwwqup2oO1zuNcBN3ZViyTp6A39HEaSZ9CfiXxsoLuAW5NsS7J+\nOJVJkgYthsUH/xnwPw86HPXyqtqV5MeBrUn+tpmxPEkTKOsBVq5c2X21kvQUNfQZBnAZBx2Oqqpd\nzc/dwMeBNYfauao2VFWvqnrj4+OdFipJT2VDDYwkzwZ+AfjEQN8pSZ61/z1wEXDvcCqUJO3X5WW1\nNwJrgRVJdgLXACcCVNX7m2G/AtxaVd8f2PW5wMeT7K/vhqqa6qpOSVI7nQVGVa1rMeZ6+pffDvZt\nB17STVWSpCO1GM5hSJJGgIEhSWrFwJAktWJgSJJaMTAkSa0YGJKkVgwMSVIrBoYkqRUDQ5LUioEh\nSWrFwJAktWJgSJJaMTAkSa0YGJKkVgwMSVIrBoYkqRUDQ5LUSmeBkWRjkt1J5n0ed5K1SR5Ocnfz\n+oOBbRNJHkwyk+SqrmqUJLXX5QzjemDiMGPuqKpzm9e1AEmWAe8FLgbOAdYlOafDOiVJLXQWGFV1\nO7D3CHZdA8xU1faqegy4Cbj0mBYnSfqRDfscxs8l+XKSm5O8sOk7HXhoYMzOpk+SNETLh/jdXwKe\nX1X/kOQS4L8DZwOZZ2wd6kOSrAfWA6xcubKLOiVJDHGGUVV/V1X/0Lz/NHBikhX0ZxRnDgw9A9i1\nwOdsqKpeVfXGx8c7rVmSnsqGFhhJnpckzfs1TS2zwF3A2UnOSnIScBmwZVh1SpL6OjskleRGYC2w\nIslO4BrgRICqej/wq8BbkswBjwKXVVUBc0neBtwCLAM2VtV9XdUpSWon/f+jl4Zer1fT09PDLkOS\nRkaSbVXVazN22FdJSZJGhIEhSWrFwJAktWJgSJJaMTAkSa0YGJKkVgwMSVIrBoYkqRUDQ5LUioEh\nSWrFwJAktWJgSJJaMTAkSa0YGJKkVgwMSVIrBoYkqRUDQ5LUSmeBkWRjkt1J7j3E9tcn+Urz+nyS\nlwxs25HkniR3J/ERepK0CHQ5w7gemFhg+zeAX6iqFwPvADYctP1VVXVu20cHSpK6tbyrD66q25Os\nWmD75weadwJndFWLJOnoLZZzGG8Ebh5oF3Brkm1J1i+0Y5L1SaaTTO/Zs6fTIiXpqayzGUZbSV5F\nPzBeMdD98qraleTHga1J/raqbp9v/6raQHM4q9frVecFS9JT1FBnGEleDHwQuLSqZvf3V9Wu5udu\n4OPAmuFUKEnab2iBkWQl8JfAb1bVVwf6T0nyrP3vgYuAea+0kiQdP50dkkpyI7AWWJFkJ3ANcCJA\nVb0f+ANgDHhfEoC55oqo5wIfb/qWAzdU1VRXdUqS2unyKql1h9n+JuBN8/RvB17y5D0kScO0WK6S\nkiQtcgaGJKkVA0OS1IqBIUlqxcCQJLViYEiSWjEwJEmtGBiSpFYMDElSKwaGJKmVBZcGSfLShbZX\n1ZeObTmSpMXqcGtJ/XHz82lAD/gyEODFwBc58BkWkqQlbMFDUlX1qqp6FfBN4KVV1auqlwHnATPH\no0BJ0uLQ9hzGT1XVPfsbVXUvcG43JUmSFqO2y5s/kOSDwH+j/7zt3wAe6KwqSdKi0zYw/iXwFuC3\nm/btwJ90UpEkaVFqdUiqqv4v8H7gqqr6lap6d9O3oCQbk+xOMu8jVtN3XZKZJF8ZvCoryWSSrzWv\nyba/kCSpG60CI8lrgbuBqaZ9bpItLXa9HphYYPvFwNnNaz3NrCXJc+g/0vV8YA1wTZJT29QqSepG\n25Pe19D/j/t7AFV1N7DqcDtV1e3A3gWGXApsrr47gR9LchrwamBrVe2tqu8CW1k4eCRJHWsbGHNV\n9XAH33868NBAe2fTd6h+SdKQtA2Me5P8OrAsydlJ3gN8/hh8f+bpqwX6n/wByfok00mm9+zZcwxK\nkiTNp21gXAG8EPgBcAPwMPA7x+D7dwJnDrTPAHYt0P8kVbWhuaGwNz4+fgxKkiTNp+1VUo9U1e8D\na6vqZ6rq6jZXSbWwBbi8uVrqZ4GHq+pbwC3ARUlObU52X9T0SZKGpNV9GEl+Hvgg8ExgZZKXAP+q\nqn7rMPvdCKwFViTZSf/k+YkAVfV+4NPAJfSXGXmE/v0eVNXeJO8A7mo+6tqqWujkuSSpY21v3Hs3\n/SuXtgBU1ZeT/NPD7VRV6w6zvYC3HmLbRmBjy/okSR1r/TyMqnrooK59x7gWSdIi1naG8VBzWKqS\nnARciWtJSdJTStsZxpvpHzo6Hfg/9FeqnfdQkiRpaWo1w6iq7wCv77gWSdIi1nYtqZ9I8skke5rF\nBD+R5Ce6Lk6StHi0PSR1A/Bh4DTgHwMfAW7sqihJ0uLTNjBSVR+qqrnmtf9BSpKkp4i2gfHZJFcl\nWZXk+Ul+F/hUkuc0S5FLOoSZmRle85rXMDMzM+xSpKOS/r1zhxmUfKN5u3/w4OKAVVWL4nxGr9er\n6enpYZchHeANb3gDO3bsYNWqVVx//fXDLkc6QJJtVdVrM3bBGUaSn0nyvKo6q6rOAv4tcC/wSeBl\nTf+iCAtpMZqZmWHHjh0A7Nixw1mGRtrhDkl9AHgMoFkK5N8Dm+ivVruh29Kk0ffOd75zwbY0Sg53\nH8aygUX/fg3YUFUfAz6W5O5uS5NG3/7ZxaHa0ig53AxjWZL9ofJLwGcGtrVdVkR6ylq1atWCbWmU\nHC4wbgT+OskngEeBOwCSrKZ/WErSAq6++uoF29IoWXCWUFX/Lslf0b9h79b6/5dUnUD/KXySFrB6\n9WpWrVr1w6ukVq9ePeySpCN22PswqurOqvp4VX1/oO+rVfWlbkuTloarr76aU045xdmFRp7nIaSO\nrV69mk996lPDLkM6aq0foHQkkkwkeTDJTJKr5tn+7iR3N6+vJvnewLZ9A9u2dFmnJOnwOpthJFkG\nvBe4ENgJ3JVkS1Xdv39MVb19YPwVwHkDH/FoVZ3bVX2SpB9NlzOMNcBMVW2vqseAm4BLFxi/DlfA\nlaRFq8vAOB0YfA74zqbvSZI8HziLA+/zeFqS6SR3Jnldd2VKktro8qR35uk71EqHlwEfrap9A30r\nq2pX86CmzyS5p6q+/qQvSdYD6wFWrlx5tDVLkg6hyxnGTuDMgfYZwK5DjL2Mgw5HVdWu5ud24HMc\neH5jcNyGqupVVW98fPxoa5YkHUKXgXEXcHaSs5KcRD8UnnS1U5J/ApwKfGGg79QkJzfvVwAvB+4/\neF9J0vHT2SGpqppL8jbgFmAZsLGq7ktyLTBdVfvDYx1w08Bd5AAvAD6Q5An6ofZHg1dXSZKOv1YP\nUBoVPkBJkn40x+wBSpIk7WdgSJJaMTAkSa0YGJKkVgwMSVIrBoYkqRUDQ5LUioEhSWrFwJA6Njs7\ny5VXXsns7OywS5GOioEhdWzTpk3cc889bN68edilSEfFwJA6NDs7y9TUFFXF1NSUswyNNAND6tCm\nTZvYt6//mJe5uTlnGRppBobUodtuu+2HgbFv3z62bt065IqkI2dgSB16xStecUD7la985ZAqkY6e\ngSF1KJnvScXSaDIwpA7dcccdC7alUWJgSB264IILWL68/2DL5cuXc+GFFw65IunIdRoYSSaSPJhk\nJslV82x/Q5I9Se5uXm8a2DaZ5GvNa7LLOqWuTE5OcsIJ/X9my5Yt4/LLLx9yRdKR6ywwkiwD3gtc\nDJwDrEtyzjxD/6Kqzm1eH2z2fQ5wDXA+sAa4JsmpXdUqdWVsbIyJiQmSMDExwdjY2LBLko5YlzOM\nNcBMVW2vqseAm4BLW+77amBrVe2tqu8CW4GJjuqUOjU5OcmLXvQiZxcaeV0GxunAQwPtnU3fwf55\nkq8k+WiSM3/EfUmyPsl0kuk9e/Yci7qlY2psbIzrrrvO2YVGXpeBMd/1hHVQ+5PAqqp6MXAbsOlH\n2LffWbWhqnpV1RsfHz/iYiVJC+syMHYCZw60zwB2DQ6oqtmq+kHT/FPgZW33lSQdX10Gxl3A2UnO\nSnIScBmwZXBAktMGmq8FHmje3wJclOTU5mT3RU2fNHJc3lxLRWeBUVVzwNvo/0f/APDhqrovybVJ\nXtsMuzLJfUm+DFwJvKHZdy/wDvqhcxdwbdMnjRyXN9dSkap5Tw2MpF6vV9PT08MuQ/qh2dlZ1q1b\nx2OPPcbJJ5/MDTfc4MlvLSpJtlVVr81Y7/SWOrRp0yaeeOIJoL9arbMMjTIDQ+rQbbfdxtzcHNB/\nHobLm2uUGRhSh1xLSkuJgSF1yLWktJQYGFKHXEtKS8nyYRcgLXWTk5Ps2LHD2YVGnoEhdWz/WlLS\nqPOQlCSpFQNDktSKgSF1zLWktFQYGFLHXEtKS4WBIXVodnaWqakpqoqpqSlnGRppBobUIdeS0lJi\nYEgdci0pLSUGhtQh15LSUmJgSB1yLSktJZ0GRpKJJA8mmUly1Tzb/3WS+5N8JclfJXn+wLZ9Se5u\nXlsO3lcaBa4lpaWks6VBkiwD3gtcCOwE7kqyparuHxj2N0Cvqh5J8hbgPwK/1mx7tKrO7ao+6Xhx\nLSktFV3OMNYAM1W1vaoeA24CLh0cUFWfrapHmuadwBkd1iMNxf61pJxdaNR1GRinAw8NtHc2fYfy\nRuDmgfbTkkwnuTPJ67ooUJLUXpeBkXn6at6ByW8APeBdA90rmweT/zrwn5P85CH2Xd8Ey/SePXuO\ntmbpmHNpEC0VXQbGTuDMgfYZwK6DByW5APh94LVV9YP9/VW1q/m5HfgccN58X1JVG6qqV1W98fHx\nY1e9dIy4NIiWii4D4y7g7CRnJTkJuAw44GqnJOcBH6AfFrsH+k9NcnLzfgXwcmDwZLk0EgaXBrn5\n5pudZWikdRYYVTUHvA24BXgA+HBV3Zfk2iSvbYa9C3gm8JGDLp99ATCd5MvAZ4E/OujqKmkkbNq0\niccffxyAxx9/3FmGRlqq5j2tMJJ6vV5NT08Puwzphy6++GIeffTRH7af/vSnc/PNNy+wh3R8JdnW\nnC8+LO/0ljr03Oc+d8G2NEoMDKlD3/72txdsS6PEwJA6tGbNmgPa559//pAqkY6egSF1aPv27Qe0\nv/71rw+pEunoGRhShx566KEF29IoMTCkDp1yyikLtqVRYmBIHfr+97+/YFsaJQaGJKkVA0OS1IqB\nIUlqxcCQJLViYEgdSrJgWxolBobUIQNDS4mBIXXo4MUGn/e85w2pEunoGRhSh3bv3n1A28UHNcoM\nDElSKwaG1KGxsbED2itWrBhSJdLR6zQwkkwkeTDJTJKr5tl+cpK/aLZ/McmqgW2/1/Q/mOTVXdYp\ndcVDUlpKOguMJMuA9wIXA+cA65Kcc9CwNwLfrarVwLuB/9Dsew5wGfBCYAJ4X/N5kqQhWd7hZ68B\nZqpqO0CSm4BLgfsHxlwK/GHz/qPAf03/usNLgZuq6gfAN5LMNJ/3hQ7rPWrvec97mJqaGnYZi8Ij\njzzCUnpe/LG0du3aYZcwVEl4xjOeMewyFoWJiQmuuOKKYZfRWpeHpE4HBhf/39n0zTumquaAh4Gx\nlvsCkGR9kukk03v27DlGpUuSDtblDGO+O5QO/pPzUGPa7NvvrNoAbADo9XpD/ZP2iiuuGKm/FtS9\n+WYTn/vc5457HdKx0OUMYydw5kD7DGDXocYkWQ48G9jbcl9J0nHUZWDcBZyd5KwkJ9E/ib3loDFb\ngMnm/a8Cn6n+ge8twGXNVVRnAWcD/6vDWqVOHDybcHahUdbZIamqmkvyNuAWYBmwsaruS3ItMF1V\nW4A/Az7UnNTeSz9UaMZ9mP4J8jngrVW1r6taJUmHl6V0JUuv16vp6elhlyFJIyPJtqrqtRnrnd6S\npFYMDElSKwaGJKkVA0OS1MqSOumdZA/wzWHXIc1jBfCdYRchzeP5VTXeZuCSCgxpsUoy3fZKFGmx\n8pCUJKkVA0OS1IqBIR0fG4ZdgHS0PIchSWrFGYYkqRUDQ5LUioEhSWrFwJAktWJgSJJa+X+IE97c\nfyqYXgAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# box-plot for speeds with outliers\n", "sns.boxplot(y=\"Speed\", data =frame_with_durations_modified)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observations:**\n", "* Again it is difficult to make sense from the box plot.\n", "* Ths higest value is approximately 2x10^8 miles/hr (which is almost the speed of light :-)\n", "* So we move to percentiles." ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Sort the Speed column\n", "var =frame_with_durations_modified[\"Speed\"].values\n", "var = np.sort(var)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 percentile value is 0.0\n", "10 percentile value is 6.409495548961425\n", "20 percentile value is 7.80952380952381\n", "30 percentile value is 8.929133858267717\n", "40 percentile value is 9.98019801980198\n", "50 percentile value is 11.06865671641791\n", "60 percentile value is 12.286689419795222\n", "70 percentile value is 13.796407185628745\n", "80 percentile value is 15.963224893917962\n", "90 percentile value is 20.186915887850468\n", "100 percentile value is 192857142.857\n" ] } ], "source": [ "# calculating speed values at each percntile 0,10,20,30,40,50,60,70,80,90,100 \n", "for i in range(0,100,10):\n", " print(\"{} percentile value is {}\".format(i,var[int(len(var)*(i/100))]))\n", "print(\"100 percentile value is \",var[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Zooming in from 90th percentile to 100th" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "90 percentile value is 20.186915887850468\n", "91 percentile value is 20.91645569620253\n", "92 percentile value is 21.752988047808763\n", "93 percentile value is 22.721893491124263\n", "94 percentile value is 23.844155844155843\n", "95 percentile value is 25.182552504038775\n", "96 percentile value is 26.80851063829787\n", "97 percentile value is 28.84304932735426\n", "98 percentile value is 31.591128254580514\n", "99 percentile value is 35.7513566847558\n", "100 percentile value is 192857142.857\n" ] } ], "source": [ "#calculating speed values at each percntile 90,91,92,93,94,95,96,97,98,99,100\n", "for i in range(90,100):\n", " print(\"{} percentile value is {}\".format(i,var[int(len(var)*(i/100))]))\n", "print(\"100 percentile value is \",var[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Zooming in still from 99th percentile to 100th" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "99.0 percentile value is 35.7513566847558\n", "99.1 percentile value is 36.31084727468969\n", "99.2 percentile value is 36.91470054446461\n", "99.3 percentile value is 37.588235294117645\n", "99.4 percentile value is 38.33035714285714\n", "99.5 percentile value is 39.17580340264651\n", "99.6 percentile value is 40.15384615384615\n", "99.7 percentile value is 41.338301043219076\n", "99.8 percentile value is 42.86631016042781\n", "99.9 percentile value is 45.3107822410148\n", "100 percentile value is 192857142.857\n" ] } ], "source": [ "#calculating speed values at each percntile 99.0,99.1,99.2,99.3,99.4,99.5,99.6,99.7,99.8,99.9,100\n", "for i in np.arange(0.0, 1.0, 0.1):\n", " print(\"{} percentile value is {}\".format(99+i,var[int(len(var)*(float(99+i)/100))]))\n", "print(\"100 percentile value is \",var[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Removing further outliers based on the 99.9th percentile value" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": true }, "outputs": [], "source": [ "frame_with_durations_modified=frame_with_durations[(frame_with_durations.Speed>0) & (frame_with_durations.Speed<45.31)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Let's calculate the avg yellow speed of cabs in New-York" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "12.450173996027528" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sum(frame_with_durations_modified[\"Speed\"]) / float(len(frame_with_durations_modified[\"Speed\"]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The avg speed in New York is 12.45 miles/hr, so a cab driver can travel approx 2 miles per 10 min on avg. \n", "\n", "**This information will be used later**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. Outlier Detection - Trip Distance\n", "* Uptill now we have removed the outliers based on trip durations and cab speeds\n", "* Let's try if there are any outliers in trip distances" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAADuCAYAAADMW/vrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAE1BJREFUeJzt3X2MXFd5x/HvY6+hxKEkTEwUnLiOYrcqVDTAClKoGqfx\nkk3aEKiaKqjBqxZpoQpxoLQV8EdDpKJSiRfVKUVxITBLoTTpi2K1ySZrCweh8ma3aZwXIizYJE6M\nYwaaRAGR7O7TP+Yumix7d8fjvb476+9HGs2cM/fOPLa8/u0599x7IzORJGk+q+ouQJK0fBkSkqRS\nhoQkqZQhIUkqZUhIkkoZEpKkUoaEJKmUISFJKmVISJJKDdRdwPE644wzcuPGjXWXIUl9Zf/+/T/I\nzHWLbdf3IbFx40b27dtXdxmS1Fci4uFutnO6SZJUypCQJJUyJCRJpQwJSVIpQ0KqSKvVYvv27bRa\nrbpLkXpmSEgVaTabHDhwgLGxsbpLkXpmSEgVaLVajI+Pk5mMj487mlDfMiSkCjSbTWZmZgCYnp52\nNKG+ZUhIFdi9ezdTU1MATE1NMTExUXNFUm8MCakCW7duZWCgfUGDgYEBhoaGaq5I6o0hIVVgZGSE\nVavaP16rV69m27ZtNVck9abSkIiIcyLiyxHxYETcHxHXFf0fiojHIuKe4nFZxz4fiIiDEfFQRFxS\nZX1SVRqNBsPDw0QEw8PDNBqNukuSelL1Bf6mgPdl5n9HxIuB/RExOzn7icz8aOfGEfEK4CrglcDL\ngd0R8cuZOV1xndKSGxkZYXJy0lGE+lqlIZGZh4HDxeunI+JBYP0Cu1wBfCkzfwp8LyIOAq8DvlZl\nnVIVGo0GO3bsqLsM6bicsGMSEbEReDXwjaLr3RFxb0TcHBGnF33rgUc7djvEwqEiSarQCQmJiDgV\n+FfgPZn5FPAp4DzgfNojjY/NbjrP7jnP541GxL6I2Hf06NGKqpYkVR4SEbGGdkB8ITP/DSAzj2Tm\ndGbOAP9Ae0oJ2iOHczp2Pxt4fO5nZubOzBzMzMF16xa9sZIkqUdVr24K4DPAg5n58Y7+szo2eytw\nX/F6F3BVRLwwIs4FNgPfrLJGSVK5qlc3vRF4O3AgIu4p+j4IvC0izqc9lTQJvBMgM++PiFuAB2iv\njLrGlU2SVJ+qVzd9lfmPM9y+wD4fBj5cWVGSpK55xrUkqZQhIUkqZUhIkkoZEpKkUoaEJKmUISFJ\nKmVISJJKGRKSpFKGhCSplCEhSSplSEiSShkSkqRShoQkqZQhIUkqZUhIkkoZEpKkUoaEJKmUISFJ\nKmVISBVptVps376dVqtVdylSzwwJqSLNZpMDBw4wNjZWdylSzwwJqQKtVovx8XEyk/HxcUcT6luG\nhFSBZrPJzMwMANPT044m1LcMCakCu3fvZmpqCoCpqSkmJiZqrkjqjSEhVWDr1q0MDAwAMDAwwNDQ\nUM0VSb0xJKQKjIyMsGpV+8dr9erVbNu2reaKpN4YElIFGo0Gw8PDRATDw8M0Go26S5J6MlB3AdJK\nNTIywuTkpKMI9TVDQqpIo9Fgx44ddZchHRenmyRJpQwJSVKpSkMiIs6JiC9HxIMRcX9EXFf0vzQi\nJiLiO8Xz6UV/RMSOiDgYEfdGxGuqrE+StLCqRxJTwPsy81eBC4BrIuIVwPuBPZm5GdhTtAEuBTYX\nj1HgUxXXJ0laQKUhkZmHM/O/i9dPAw8C64ErgGaxWRN4S/H6CmAs274OnBYRZ1VZoySp3Ak7JhER\nG4FXA98AzszMw9AOEuBlxWbrgUc7djtU9M39rNGI2BcR+44ePVpl2ZJ0UjshIRERpwL/CrwnM59a\naNN5+vLnOjJ3ZuZgZg6uW7duqcqUJM1ReUhExBraAfGFzPy3ovvI7DRS8fxE0X8IOKdj97OBx6uu\nUZI0v6pXNwXwGeDBzPx4x1u7gJHi9QhwW0f/tmKV0wXAk7PTUpKkE6/qM67fCLwdOBAR9xR9HwQ+\nAtwSEe8AHgGuLN67HbgMOAj8GPijiuuTJC2g0pDIzK8y/3EGgIvn2T6Ba6qsSZLUPc+4liSVMiQk\nSaUMCUlSKUNCqkir1WL79u20Wq26S5F6ZkhIFWk2mxw4cICxsbG6S5F6ZkhIFWi1WoyPj5OZjI+P\nO5pQ3zIkpAo0m01mZmYAmJ6edjShvmVISBXYvXs3U1NTAExNTTExMVFzRVJvDAmpAlu3bmVgoH2u\n6sDAAENDQzVXJPXGkJAqMDIywqpV7R+v1atXs23btporknpjSEgVaDQaXHTRRQBs2bKFRqNRc0VS\nbwwJqSLtS5FJ/c2QkCrQarXYu3cvAHv37nUJrPqWISFVwCWwWikMCakCLoHVSmFISBVwCaxWCkNC\nqoBLYLVSGBJSBRqNBsPDw0QEw8PDLoFV3zqm25dGxNrMfKaqYqSVZGRkhMnJSUcR6mtdjSQi4g0R\n8QDwYNH+9Yj4+0ork/pco9Fgx44djiLU17qdbvoEcAnQAsjM/wV+q6qiJEnLQ9fHJDLz0Tld00tc\niyRpmen2mMSjEfEGICPiBcB2iqknSdLK1e1I4l3ANcB64BBwftGWJK1gXY0kMvMHwB9WXIskaZnp\ndnVTMyJO62ifHhE3V1eWJGk56Ha66VWZ+X+zjcz8EfDqakqSJC0X3YbEqog4fbYRES/lGE/EkyT1\nn27/o/8Y8F8R8S9F+0rgw9WUJElaLroaSWTmGPD7wBHgCeD3MvPzi+0XETdHxBMRcV9H34ci4rGI\nuKd4XNbx3gci4mBEPBQRlxz7H0eStJSOZcro28CPZveJiA2Z+cgi+3wO+Dtg7h1XPpGZH+3siIhX\nAFcBrwReDuyOiF/OTE/ak6SadBUSEXEtcD3tkcQ0EEACr1pov8z8SkRs7LKWK4AvZeZPge9FxEHg\ndcDXutxfkrTEuh1JXAf8SmYu1Y163x0R24B9wPuK1VLrga93bHOo6Ps5ETEKjAJs2LBhiUqSJM3V\n7eqmR4Enl+g7PwWcR/us7cO0D4pDe3QyV873AZm5MzMHM3Nw3bp1S1SWJGmubkcS3wX2RsR/Aj+d\n7czMjx/rF2bmkdnXEfEPwH8UzUPAOR2bng08fqyfL0laOt2OJB4BJoAXAC/ueByziDiro/lWYHbl\n0y7gqoh4YUScC2wGvtnLd0iSlka31266oZcPj4h/ArYAZ0TEIdoHv7dExPm0p5ImgXcW33F/RNwC\nPABMAde4skmS6hWZ8077P3+jiHXAX9BenvoLs/2Z+dvVldadwcHB3LdvX91lSFJfiYj9mTm42Hbd\nTjd9gfZ5EucCN9AeAXyr5+okSX2h25BoZOZngOcy8+7M/GPgggrrkiQtA92ubnqueD4cEb9De9XR\n2dWUJElaLrodSfxVRLwEeB/wZ8CngfdUVpW0ArRaLbZv306rtVTnoEonXrch8aPMfDIz78vMizLz\ntcAPqyxM6nfNZpMDBw4wNjb30mVS/+g2JG7ssk8S7VHE+Pg4mckdd9zhaEJ9a8FjEhHxG8AbgHUR\n8acdb/0isLrKwqR+1mw2efbZZwF49tlnGRsb473vfW/NVUnHbrGRxAuAU2mHSeeZ1k/Rvr+EpHlM\nTEw8r33XXXfVVIl0fBYcSWTm3cDdEfG5zHwYICJWAadm5lMnokCpHzUaDQ4dOvS8ttSPuj0m8dcR\n8YsRsZb2ZTMeiog/r7Auqa8dPnx4wbbUL7oNiVcUI4e3ALcDG4C3V1aV1OciYsG21C+6DYk1EbGG\ndkjclpnPUXKvB0lw8cUXL9iW+kW3IXET7es1rQW+EhG/RPvgtaR5jI6OsmpV+8dr1apVjI6O1lyR\n1JuuQiIzd2Tm+sy8LNseBi6quDapbzUaDYaGhgAYGhrywLX61mLnSVydmf845xyJTsd8ZzrpZDE6\nOsrhw4cdRaivLXaBv7XFc093oZNOZo1Ggx07dtRdhnRcFjtP4qbiuac700kns1arxQ033MD111/v\ndJP61mLTTQv+GpSZ25e2HGnl6LzAn5fkUL9a7MD1/uLxC8BrgO8Uj/MB7z8tlWi1Wtxxxx1e4E99\nb8GQyMxmZjaBzcBFmXljZt4IXEw7KCTNo9lsMjU1BcBzzz3n5cLVt7o9T+LlPP/g9alFn6R5TExM\nkNk+3zQzvcCf+la3ty/9CPA/EfHlon0h8KFKKpJWgDPPPJPJycnntaV+1FVIZOZnI+IO4PVF1/sz\n8/uz70fEKzPz/ioKlPrRkSNHFmxL/aLb6SYy8/uZeVvx+P6ctz+/xHVJfW1oaOhnF/WLCN70pjfV\nXJHUm65DYhFe4lLqMDIywpo1awBYs2YN27Ztq7kiqTdLFRJeEVbq0Gg0GB4eJiK49NJLPZlOfWup\nQkLSHG9+85s55ZRTuPzyy+suRerZUoXEs0v0OdKKccstt/DMM89w66231l2K1LOuQyIifi8iPh4R\nH4uIt3a+l5kXLH1pUv9qtVo/Ozfizjvv9Ixr9a2uQiIi/h54F3AAuA94Z0R8sov9bo6IJyLivo6+\nl0bERER8p3g+veiPiNgREQcj4t6IeE1vfySpfjfddNPz2jt37qypEun4dDuSuBC4JDM/m5mfBS4D\ntnSx3+eA4Tl97wf2ZOZmYE/RBriU9uU/NgOjwKe6rE1aduaeYX3nnXfWVIl0fLoNiYeADR3tc4B7\nF9spM78C/HBO9xVAs3jdpH3f7Nn+seLOd18HTouIs7qsT5JUgW5DogE8GBF7I2Iv8ACwLiJ2RcSu\nY/zOMzPzMEDx/LKifz3waMd2h4q+nxMRoxGxLyL2HT169Bi/XpLUrW6v3fSXlVbRNt8JefOef5GZ\nO4GdAIODg56jIUkV6fbaTXcv4XceiYizMvNwMZ30RNF/iPY01qyzgceX8HulE+YlL3kJTz755M/a\np512Wo3VSL1bcLopIr5aPD8dEU91PJ6OiKd6/M5dwEjxegS4raN/W7HK6QLgydlpKanfbNq06Xnt\n8847r6ZKpOOz2D2uf7N4fvFC25WJiH+ivQrqjIg4BFxP+7Ljt0TEO4BHgCuLzW+nvWrqIPBj4I96\n+U5pOdi/f/+CbalfLDrdFBGrgHsz89eO9cMz820lb108z7YJXHOs3yFJqs6iq5sycwb434jYsNi2\nkqSVpdslsGcB90fEntllrz0sfZVOGlu2bFmwLfWLbpfAngr8bkc7gL9Z+nKkleHaa69l7969z2tL\n/ajbkBiYuww2Il5UQT3SitBoNNiyZQt79+5ly5Yt3k9CfWuxJbB/EhEHgF8pLro3+/geXVyWQzqZ\nXX311axdu5arr7667lKkni12TOKLwOW0z2G4vOPx2sz0X760gFtvvdX7SajvLRgSmflkZk5m5tsy\n8+GOx9yL9knq0Gq1mJiYAGBiYsL7SahveftSqQI7d+5kZmYGgJmZGe8nob5lSEgV2LNnz4JtqV8Y\nElIFpqenF2xL/cKQkCrQvspMeVvqF4aEJKmUISFJKmVISJJKGRKSpFKGhCSplCEhSSplSEiSShkS\nkqRShoQkqZQhIUkqZUhIkkoZEpKkUoaEJKmUISFJKmVISJJKGRKSpFKGhCSplCEhSSo1UNcXR8Qk\n8DQwDUxl5mBEvBT4Z2AjMAn8QWb+qK4aJelkV/dI4qLMPD8zB4v2+4E9mbkZ2FO0JUk1qTsk5roC\naBavm8BbaqxFkk56dYZEAndFxP6IGC36zszMwwDF88vm2zEiRiNiX0TsO3r06AkqV5JOPrUdkwDe\nmJmPR8TLgImI+Ha3O2bmTmAnwODgYFZVoCSd7GobSWTm48XzE8C/A68DjkTEWQDF8xN11SdJqikk\nImJtRLx49jXwJuA+YBcwUmw2AtxWR32SpLa6ppvOBP49ImZr+GJmjkfEt4BbIuIdwCPAlTXVJ0mi\nppDIzO8Cvz5Pfwu4+MRXJEmaz3JbAitJWkYMCUlSKUNCklTKkJAklTIkJEmlDAlJUilDQpJUypCQ\nJJUyJCRJpQwJSVIpQ0KSVMqQkCSVMiQkSaUMCUlSKUNCklTKkJAklTIkJEmlDAlJUilDQpJUypCQ\nJJUyJCRJpQwJSVIpQ0KSVMqQkCSVMiQkSaUMCUlSKUNCklRqoO4CtLLceOONHDx4sO4ylqXrrruu\n7hJqs2nTJq699tq6y1APHElIkkotu5FERAwDfwusBj6dmR+puaRF+duztLCDBw+e1COpTv02qlpW\nIRERq4FPAkPAIeBbEbErMx+ot7KF3X333Rz9QQtWL6u/Ti0DAWTx+p77lvU/Y50I01M89thjhsRx\neB1wMDO/CxARXwKuAPzp6hcz05C5+HYrXjsesrM9PVVfOXWLgFWr665CPVhuIbEeeLSjfQh4/dyN\nImIUGAXYsGHDialsARdeeKHTTYXHHnuMn/zkJ3WXsSw888wzP3u9du0pNVZSvxe96EWsX7++7jKW\nhU2bNtVdwjFZbiER8/T93K+lmbkT2AkwODhY+6+t/TR0lKRjsdxWNx0Czulonw08XlMtknTSW24h\n8S1gc0ScGxEvAK4CdtVckySdtJbVdFNmTkXEu4E7aS+BvTkz76+5LEk6aS2rkADIzNuB2+uuQ5K0\n/KabJEnLiCEhSSplSEiSShkSkqRSkX1+CYWIOAo8XHcdUokzgB/UXYQ0j1/KzHWLbdT3ISEtZxGx\nLzMH665D6pXTTZKkUoaEJKmUISFVa2fdBUjHw2MSkqRSjiQkSaUMCUlSKUNCklTKkJAklTIkJEml\n/h96YoABQmyuUwAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# box-plot showing outliers in trip-distance values\n", "sns.boxplot(y=\"trip_distance\", data = frame_with_durations_modified)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observations:**\n", "* The highest value seems to be more than 250 miles (402 km) which is clearly not a local ride in New York.\n", "* Again the box plot is difficult to make sense and hence we move to percentiles." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Sor the \"trip_distance\" column\n", "var =frame_with_durations_modified[\"trip_distance\"].values\n", "var = np.sort(var)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 percentile value is 0.01\n", "10 percentile value is 0.66\n", "20 percentile value is 0.9\n", "30 percentile value is 1.1\n", "40 percentile value is 1.39\n", "50 percentile value is 1.69\n", "60 percentile value is 2.07\n", "70 percentile value is 2.6\n", "80 percentile value is 3.6\n", "90 percentile value is 5.97\n", "100 percentile value is 258.9\n" ] } ], "source": [ "# calculating trip distance values at each percntile 0,10,20,30,40,50,60,70,80,90,100 \n", "for i in range(0,100,10):\n", " print(\"{} percentile value is {}\".format(i,var[int(len(var)*(i/100))]))\n", "print(\"100 percentile value is \",var[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Zooming in from 90th percentile to 100th" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "90 percentile value is 5.97\n", "91 percentile value is 6.45\n", "92 percentile value is 7.07\n", "93 percentile value is 7.85\n", "94 percentile value is 8.72\n", "95 percentile value is 9.6\n", "96 percentile value is 10.6\n", "97 percentile value is 12.1\n", "98 percentile value is 16.03\n", "99 percentile value is 18.17\n", "100 percentile value is 258.9\n" ] } ], "source": [ "# calculating trip distance values at each percntile 90,91,92,93,94,95,96,97,98,99,100\n", "for i in range(90,100):\n", " print(\"{} percentile value is {}\".format(i,var[int(len(var)*(float(i)/100))]))\n", "print(\"100 percentile value is \",var[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Zooming in further from 99th percentile to 100th" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "99.0 percentile value is 18.17\n", "99.1 percentile value is 18.37\n", "99.2 percentile value is 18.6\n", "99.3 percentile value is 18.83\n", "99.4 percentile value is 19.13\n", "99.5 percentile value is 19.5\n", "99.6 percentile value is 19.96\n", "99.7 percentile value is 20.5\n", "99.8 percentile value is 21.22\n", "99.9 percentile value is 22.57\n", "100 percentile value is 258.9\n" ] } ], "source": [ "# calculating trip distance values at each percntile 99.0,99.1,99.2,99.3,99.4,99.5,99.6,99.7,99.8,99.9,100\n", "for i in np.arange(0.0, 1.0, 0.1):\n", " print(\"{} percentile value is {}\".format(99+i,var[int(len(var)*(float(99+i)/100))]))\n", "print(\"100 percentile value is \",var[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Removing further outliers based on the 99.9th percentile value" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": true }, "outputs": [], "source": [ "frame_with_durations_modified=frame_with_durations[(frame_with_durations.trip_distance>0) & (frame_with_durations.trip_distance<23)]" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYMAAADuCAYAAADbeWsiAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAADLRJREFUeJzt3X+sX/Vdx/HnqwVCB1Rh3DW1o+vmJVO2DLZcEYFsrcsm\nm0bm4oxEF4xLuinruriojX/olmg2/9iWcrPpakCqblOjEkjsdIasEHTR3SrbCoVwQwa0FHr5YWlE\nNyhv/7jnkktp7/fc9p7vubf3+Ui++X7P53vu/b4gtC8+53zO96SqkCQtbyv6DiBJ6p9lIEmyDCRJ\nloEkCctAkoRlIEnCMpAkYRlIkrAMJEnAGX0HaOvCCy+sDRs29B1DkpaUPXv2PFlVI4P2WzJlsGHD\nBiYmJvqOIUlLSpKH2+znYSJJkmUgSbIMJElYBpIkltAJZGkx2rhx40uvd+/e3VsO6VQ5M5AkWQbS\nyZo9KzjetrSUWAaSJMtAkmQZSJKwDCRJWAaSJCwDSRKWgSQJy0CShGUgScIykCRhGUiSsAwkSVgG\nkiQsA0kSloEkCctAkoRlIEnCMpAkYRlIkrAMJElYBpIkLANJEpaBJImOyyDJRUm+kWRfknuTbG3G\nL0jyL0kebJ7P7zKHJGluXc8MXgA+UVU/DlwB3JDkEmAbcEdVXQzc0WxLknrSaRlU1cGq+s/m9RFg\nH7AOuBbY2ey2E3hflzkkSXMb2jmDJBuAtwL/DqypqoMwXRjAa07wM5uTTCSZmJqaGlZUSVp2hlIG\nSc4F/h74eFU92/bnqmpHVY1V1djIyEh3ASVpmeu8DJKcyXQRfLmq/qEZfiLJ2ub9tcChrnNIkk6s\n69VEAW4C9lXV52a9dTtwffP6euC2LnNIkuZ2Rse//yrgg8B3k9zTjP0e8Bngb5N8CHgE+EDHOSRJ\nc+i0DKrqbiAnePudXX62JKk9r0CWJFkGkiTLQJKEZSBJwjKQJGEZSJKwDCRJWAaSJCwDSRKWgSQJ\ny0CShGUgScIykCRhGUiSsAwkSVgGkiQsA0kSloEkCctAkoRlIEnCMpAkYRlIkrAMJElYBpIkLANJ\nEpaBJAnLQJKEZSBJwjKQJGEZSJKwDCRJWAaSJCwDSRKWgSSJjssgyc1JDiXZO2vsk0kOJLmneby3\nywySpMG6nhncAlxznPHPV9VlzWNXxxkkSQN0WgZVdRfwdJefIUk6dX2dM/hoku80h5HOP9FOSTYn\nmUgyMTU1Ncx8krSszKsMkpyzAJ/5J8CPApcBB4HPnmjHqtpRVWNVNTYyMrIAHy1JOp5WZZDkyiT3\nAfua7UuTfPFkPrCqnqiqo1X1IvBnwOUn83skSQun7czg88DPAE8BVNW3gbefzAcmWTtr8xeAvSfa\nV5I0HGe03bGqHk0ye+jooJ9J8lVgI3Bhkv3AHwAbk1wGFPA94MPzyCtJ6kDbMng0yZVAJTkL+BjN\nIaO5VNV1xxm+aR75JElD0PYw0UeAG4B1wH6mT/7e0FUoSdJwtZoZVNWTwK90nEWS1JO2q4l2Jvnh\nWdvnJ7m5u1iSpGFqe5joLVX13zMbVfUM8NZuIkmShq1tGayYfaVwkguYx0okSdLi1vYv9M8C/5bk\n75rtDwB/1E0kSdKwtT2B/BdJ9gCbgADvr6r7Ok0mSRqa+RzquR94ZuZnkqyvqkc6SSVJGqpWZZBk\nC9NXDz/B9JXHYfoK4rd0F02SNCxtZwZbgTdW1VNdhpEk9aPtaqJHgcNdBpEk9aftzOAhYHeSfwS+\nPzNYVZ/rJJUkaajalsEjzeOs5iFJOo20XVr6qa6DSJL603Y10QjwO8CbgLNnxqvqpzvKJUkaorYn\nkL/M9HUGrwc+xfRNab7VUSZJ0pC1LYNXV9VNwPNVdWdV/TpwRYe5JElD1PYE8vPN88EkPws8Bry2\nm0iSpGFrWwZ/mOSHgE8A48Bq4OOdpZIkDVXbMnimqg4zfeHZJoAkV3WWSpI0VG3PGYy3HJMkLUFz\nzgyS/BRwJTCS5LdmvbUaWNllMEnS8Aw6THQWcG6z33mzxp8FfrGrUJKk4ZqzDKrqTuDOJLdU1cMA\nSVYA51bVs8MIKEnqXttzBp9OsjrJOcB9wANJfrvDXJKkIWpbBpc0M4H3AbuA9cAHO0slSRqqtmVw\nZpIzmS6D26rqeabvdCZJOg20LYMvMf19ROcAdyV5HdMnkSVJp4G2X2F9I3DjrKGHk2zqJpIkadgG\nXWfwq1X1V8dcYzCbdzqTpNPAoJnBOc3zeXPuJUla0gZdZ/Cl5tk7nUnSaWzQYaIb53q/qj62sHEk\nSX0YtJpoT/M4G3gb8GDzuAw42m00SdKwDDpMtBMgya8Bm5rrC0jyp8DXB/3yJDcDPwccqqo3N2MX\nAH8DbGB6ueovVdUzJ/1PIEk6ZW2vM/gRXn4S+dxmbJBbgGuOGdsG3FFVFwN3NNuSpB61vbnNZ4D/\nSvKNZvsdwCcH/VBV3ZVkwzHD1wIbm9c7gd3A77bMIUnqQNuLzv48ydeAn2yGtlXV4zPvJ3lTVd3b\n8jPXVNXB5vceTPKaE+2YZDOwGWD9+vUtf70kab7aHiaiqh6vqtuax+PHvP2XC5xr5jN3VNVYVY2N\njIx08RGSJOZRBgNkHvs+kWQtQPN8aIEySJJO0kKVwXy+wfR24Prm9fXAbQuUQZJ0khaqDI4ryVeB\nbwJvTLI/yYeYPhn9riQPAu9qtiVJPWq7mmiQHxxvsKquO8H+71ygz5UkLYDWZZDk/cDVTB8Suruq\nbp15r6qu6CCbJGlIWh0mSvJF4CPAd4G9wIeTfKHLYJKk4Wk7M3gH8OaqKoAkO5kuBknSaaDtCeQH\ngNlXfV0EfGfh40iS+tB2ZvBqYF+S/2i2fwL4ZpLbAarq57sIJ0kajrZl8PudppAk9artdxPd2XUQ\nSVJ/Bt3p7O6qujrJEV5+lXGAqqrVnaaTJA3FoJvbXN08nzfXfpKkpW3gaqIkK5LsHUYYSVI/BpZB\nVb0IfDuJNxSQpNNU29VEa4F7m6Wl/zMz6JJSSTo9tC2Dc5m+sf2MAH+88HEkSX1oWwZnHLu8NMmq\nDvJIknowaGnpbwC/CbwhyeyvnzgP+Ncug0mShmfQzOArwNeATwPbZo0fqaqnO0slSRqqQdcZHAYO\nAye6SY0k6TTQ6W0vJUlLg2UgSbIMJEmWgSQJy0CShGUgScIykCRhGUiSsAwkSVgGkiQsA0kS7b/C\nWnrJ+Pg4k5OTfcdYlLZu3dp3hF6Njo6yZcuWvmPoJDgzkCSRquo7QytjY2M1MTHRdwzpJRs3bnzF\n2O7du4eeQ5pLkj1VNTZoP2cGkiTLQDpZx84CnBVoKbMMJEn9rSZK8j3gCHAUeKHNMS1psbn00ksB\n2L59e89JpFPT99LSTVX1ZM8ZJGnZ8zCRJKnXMijg60n2JNl8vB2SbE4ykWRiampqyPEkafnoswyu\nqqq3Ae8Bbkjy9mN3qKodVTVWVWMjIyPDTyhJy0RvZVBVjzXPh4Bbgcv7yiJJy10vZZDknCTnzbwG\n3g3s7SOLJKm/1URrgFuTzGT4SlX9U09ZJGnZ66UMquoh4NI+PluS9EouLZUkWQaSJMtAkoRlIEnC\nMpAkYRlIkrAMJElYBpIkLANJEpaBJAnLQJJE/7e9XDLGx8eZnJzsO4YWmZn/JrZu3dpzEi02o6Oj\nbNmype8YrVkGLU1OTnLP3n0cfdUFfUfRIrLiBwXAnoee6DmJFpOVzz3dd4R5swzm4eirLuB/f+y9\nfceQtMitun9X3xHmzXMGkiTLQJJkGUiSsAwkSVgGkiQsA0kSloEkCctAkoQXnbV24MABVj53eEle\nTCJpuFY+9xQHDrzQd4x5cWYgSXJm0Na6det4/Ptn+HUUkgZadf8u1q1b03eMeXFmIEmyDCRJloEk\nCc8ZzMvK5552NZFeZsX/PQvAi2ev7jmJFpPp+xksrXMGlkFLo6OjfUfQIjQ5eQSA0TcsrT/46tqa\nJfd3hmXQ0lK6fZ2GZ+Z2l9u3b+85iXRqPGcgSbIMJEmWgSSJHssgyTVJHkgymWRbXzkkST2VQZKV\nwBeA9wCXANcluaSPLJKk/lYTXQ5MVtVDAEn+GrgWuK+nPJqH8fFxJicn+46xKMz8e5hZVbTcjY6O\nuvJuierrMNE64NFZ2/ubMWlJWbVqFatWreo7hnTK+poZ5Dhj9Yqdks3AZoD169d3nUkt+X9+0umn\nr5nBfuCiWduvBR47dqeq2lFVY1U1NjIyMrRwkrTc9FUG3wIuTvL6JGcBvwzc3lMWSVr2ejlMVFUv\nJPko8M/ASuDmqrq3jyySpB6/m6iqdgF+BagkLQJegSxJsgwkSZaBJAnLQJIEpOoV13otSkmmgIf7\nziEdx4XAk32HkE7gdVU18EKtJVMG0mKVZKKqxvrOIZ0KDxNJkiwDSZJlIC2EHX0HkE6V5wwkSc4M\nJEmWgSQJy0CShGUgScIykCQB/w+7H9icUFRnpwAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# box-plot after removal of outliers\n", "sns.boxplot(y=\"trip_distance\", data = frame_with_durations_modified)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At least it is readable and we will stick to it" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 5. Outlier Detection - Total Fare\n", "* Uptill now we have removed the outliers based on trip durations, cab speeds, and trip distances\n", "* Let's try if there are any outliers in based on the total_amount" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAaIAAADuCAYAAAB75gPMAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGLNJREFUeJzt3XGQlfV97/H3JyDGtDEgrF7KYiHNzjSkbWhygvR6Z67X\nGFy8bSG92gu3DVtLu71eTdJJ2oqZO0ON6TTpNKXFGqf0alwyNcg1SaUZhG5Q2pk2IoeGiGgc9qqR\nLVZWQELiBGbxe/94flsf17N7zp7l7I/Nfl4zz5zn+T6/3/P7rX/45fec73keRQRmZma5vCX3BMzM\nbGpzIjIzs6yciMzMLCsnIjMzy8qJyMzMsnIiMjOzrJyIzMwsKyciMzPLyonIzMyymp57ApPBnDlz\nYsGCBbmnYWY2qezbt+/liGir186JqAELFiygWq3mnoaZ2aQi6buNtPOtOTMzy8qJyMzMspqQRCRp\nmqRvSfp6Ol4oaY+kQ5IekDQjxS9Mx33p/ILSNW5L8WckXVuKd6ZYn6R1pfiYxzAzs4k3USuijwNP\nl44/B2yIiA7gBLA2xdcCJyLiXcCG1A5Ji4BVwHuATuALKblNA+4ClgOLgNWp7ZjHMDOzPFqeiCS1\nA/8V+D/pWMDVwIOpSQ+wMu2vSMek8x9M7VcAWyLidEQ8B/QBS9LWFxHPRsQZYAuwoskxzCadY8eO\n8bGPfYxjx47lnopZ0yZiRfTnwB8Ar6Xj2cArETGYjvuBeWl/HnAYIJ0/mdr/e3xYn5HizYzxBpK6\nJVUlVQcGBsb+V5tNgJ6eHg4cOMDmzZtzT8WsaS1NRJJ+ETgaEfvK4RpNo865cxWvN/7rgYhNEVGJ\niEpbW90yeLMJd+zYMXbs2EFEsGPHDq+KbNJq9YroSuCXJT1PcdvsaooV0kxJQ79hageOpP1+YD5A\nOv8O4Hg5PqzPSPGXmxjDbFLp6enhtdeKGw1nz571qsgmrZYmooi4LSLaI2IBRbHBIxHxa8CjwPWp\nWRfwUNrflo5J5x+JiEjxVanibSHQATwO7AU6UoXcjDTGttRnrGOYTSrf+MY3GBws7j4PDg7S29ub\neUZmzcn1O6JbgU9I6qP4fuaeFL8HmJ3inwDWAUTEQWAr8BSwA7g5Is6m73huAXZSVOVtTW3HPIbZ\nZHPNNdcwfXqx6J8+fTof+tCHMs/IrDnyYqC+SqUSfsSPnW+OHTvG6tWrOXPmDBdeeCH3338/s2e/\nqe7GLBtJ+yKiUq+dn6xgNknNnj2bzs5OJNHZ2ekkZJOWH3pqNol1dXXx/PPPs2bNmtxTMWuaE5HZ\nJDZ79mw2btyYexpm4+Jbc2ZmlpUTkZmZZeVEZGZmWTkRmZlZVk5EZmaWlRORmZll5URkZmZZORGZ\nmVlWTkRmZpaVE5GZmWXlRGRmZlk5EZmZWVZORGZmllVLE5Gkt0p6XNK3JR2UdHuK3yfpOUn707Y4\nxSVpo6Q+SU9Iel/pWl2SDqWtqxR/v6QDqc9GSUrxSyT1pva9kmbVG8PMzCZeq1dEp4GrI+K9wGKg\nU9LSdO73I2Jx2van2HKgI23dwN1QJBVgPXAFsARYP5RYUpvuUr/OFF8H7IqIDmAXr78SvOYYZmaW\nR0sTURS+nw4vSNto7yZfAWxO/R4DZkqaC1wL9EbE8Yg4AfRSJLW5wMUR8c0o3nm+GVhZulZP2u8Z\nFq81hpmZZdDy74gkTZO0HzhKkUz2pFN/lG6NbZB0YYrNAw6Xuven2Gjx/hpxgMsi4kWA9HlpnTGG\nz7tbUlVSdWBgYEx/s5mZNa7liSgizkbEYqAdWCLpZ4DbgJ8GPgBcAtyamqvWJZqIj6ahPhGxKSIq\nEVFpa2urc0kzM2vWhFXNRcQrwG6gMyJeTLfGTgNfpPjeB4rVyfxSt3bgSJ14e404wEtDt9zS59E6\nY5iZWQatrpprkzQz7V8EXAN8p5QgRPHdzZOpyzZgTapsWwqcTLfVdgLLJM1KRQrLgJ3p3ClJS9O1\n1gAPla41VF3XNSxeawwzM8tgeouvPxfokTSNIultjYivS3pEUhvFbbL9wP9M7bcD1wF9wKvAjQAR\ncVzSHcDe1O7TEXE87d8E3AdcBDycNoDPAlslrQVeAG4YbQwzM8tDRbGZjaZSqUS1Ws09DTOzSUXS\nvoio1GvnJyuYmVlWTkRmZpaVE5GZmWXlRGRmZlk5EZmZWVZORGZmlpUTkZmZZeVEZGZmWTkRmZlZ\nVk5EZmaWlRORmZll5URkZmZZORGZmVlWTkRmZpaVE5GZmWXlRGRmZlm1+lXhb5X0uKRvSzoo6fYU\nXyhpj6RDkh6QNCPFL0zHfen8gtK1bkvxZyRdW4p3plifpHWl+JjHMDOzidfqFdFp4OqIeC+wGOiU\ntBT4HLAhIjqAE8Da1H4tcCIi3gVsSO2QtAhYBbwH6AS+IGlaegX5XcByYBGwOrVlrGOYmVkeLU1E\nUfh+OrwgbQFcDTyY4j3AyrS/Ih2Tzn9QklJ8S0ScjojngD5gSdr6IuLZiDgDbAFWpD5jHcPMzDJo\n+XdEaeWyHzgK9AL/D3glIgZTk35gXtqfBxwGSOdPArPL8WF9RorPbmKM4fPullSVVB0YGGjujzcz\ns7panogi4mxELAbaKVYw767VLH3WWpnEOYyPNsYbAxGbIqISEZW2trYaXczM7FyYsKq5iHgF2A0s\nBWZKmp5OtQNH0n4/MB8gnX8HcLwcH9ZnpPjLTYxhZmYZtLpqrk3SzLR/EXAN8DTwKHB9atYFPJT2\nt6Vj0vlHIiJSfFWqeFsIdACPA3uBjlQhN4OioGFb6jPWMczMLIPp9ZuMy1ygJ1W3vQXYGhFfl/QU\nsEXSZ4BvAfek9vcAX5LUR7FKWQUQEQclbQWeAgaBmyPiLICkW4CdwDTg3og4mK5161jGMDOzPOTF\nQH2VSiWq1WruaZiZTSqS9kVEpV47P1nBzMyyciIyM7OsnIjMzCwrJyIzM8vKicjMzLJyIjIzs6yc\niMzMLCsnIjMzy8qJyMzMsnIiMjOzrJyIzMwsKyciMzPLyonIzMyyciIyM7OsnIjMzCyrhhKRpF2N\nxMzMzMZq1EQk6a2SLgHmSJol6ZK0LQB+ot7FJc2X9KikpyUdlPTxFP9DSf8qaX/ariv1uU1Sn6Rn\nJF1binemWJ+kdaX4Qkl7JB2S9EB6ZTjpteIPpPZ70pxHHcPMzCZevRXR7wD7gJ9On0PbQ8BdDVx/\nEPhkRLwbWArcLGlROrchIhanbTtAOrcKeA/QCXxB0rT0qvG7gOXAImB16TqfS9fqAE4Aa1N8LXAi\nIt4FbEjtRhyjgb/FzMxaYNREFBF/ERELgd+LiHdGxMK0vTci/rLexSPixYj4l7R/CngamDdKlxXA\nlog4HRHPAX3AkrT1RcSzEXEG2AKskCTgauDB1L8HWFm6Vk/afxD4YGo/0hhmZpbB9EYaRcSdkv4j\nsKDcJyI2NzpQujX288Ae4ErgFklrgCrFqukERZJ6rNStn9cT1+Fh8SuA2cArETFYo/28oT4RMSjp\nZGo/2hjl+XYD3QCXX355o3+mmZmNUaPFCl8C/hT4T8AH0lZpdBBJPw58BfjdiPgecDfwU8Bi4EXg\n80NNa3SPJuLNXOuNgYhNEVGJiEpbW1uNLmZmdi40tCKiSDqLIuJN/8OuR9IFFEnobyLiqwAR8VLp\n/F8DX0+H/cD8Uvd24EjarxV/GZgpaXpaFZXbD12rX9J04B3A8TpjmJnZBGv0d0RPAv9hrBdP38nc\nAzwdEX9Wis8tNftwuj7ANmBVqnhbCHQAjwN7gY5UITeDothgW0qMjwLXp/5dFIUUQ9fqSvvXA4+k\n9iONYWZmGTS6IpoDPCXpceD0UDAifrlOvyuBjwAHJO1PsU9RVL0tprgl9jxFdR4RcVDSVuApioq7\nmyPiLICkW4CdwDTg3og4mK53K7BF0meAb1EkPtLnlyT1UayEVtUbw8zMJp4audsm6T/XikfEP5zz\nGZ2HKpVKVKvV3NMwM5tUJO2LiLr1BI1WzU2JhGNmZhOvoUQk6RSvV5bNAC4AfhARF7dqYmZmNjU0\nuiJ6e/lY0kr8I1AzMzsHmnr6dkT8LcUTDczMzMal0Vtzv1I6fAvF74rG/JsiMzOz4Rot3/6l0v4g\nRcn1inM+GzMzm3Ia/Y7oxlZPxMzMpqZGnzXXLulrko5KeknSVyS1t3pyZmb2o6/RYoUvUjwa5yco\nnlT9dylmZmY2Lo0moraI+GJEDKbtPsCPpDYzs3FrNBG9LOnXh96WKunXgWOtnJiZmU0NjSai3wR+\nFfg3ivcHXZ9iZmZm49Jo1dwLQL0nbZuZmY1Zoz9oXQh8lDe/KtzJyczMxqXRH7T+LcX7ff4OeK11\n0zEzs6mm0UT0w4jY2NKZmJnZlNRoscJfSFov6RckvW9oq9dJ0nxJj0p6WtJBSR9P8Usk9Uo6lD5n\npbgkbZTUJ+mJ8hiSulL7Q5K6SvH3SzqQ+mxMrydvagwzM5t4jSainwV+G/gs8Pm0/WkD/QaBT0bE\nu4GlwM2SFgHrgF0R0QHsSscAy4GOtHUDd0ORVID1wBUUr59YP5RYUpvuUr/OFB/TGGZmlkejt+Y+\nDLwzIs6M5eIR8SJFuTcRcUrS0xRPZlgBXJWa9QC7gVtTfHMU7y9/TNJMSXNT296IOA4gqRfolLQb\nuDgivpnim4GVwMNjHSPN1czMJlijK6JvAzPHM5CkBcDPA3uAy4b+x58+L03N5gGHS936U2y0eH+N\nOE2MMXy+3ZKqkqoDAwNj+VPNzGwMGl0RXQZ8R9Je4PRQsNHybUk/DnwF+N2I+F76Gqdm0xqxaCI+\n6nQa6RMRm4BNAJVKxe9eMjNrkUYT0fpmB5B0AUUS+puI+GoKvzR0Oyzdejua4v3A/FL3duBIil81\nLL47xdtrtG9mDDMzy6ChW3MR8Q+1tnr9UgXbPcDTEfFnpVPbgKHKty7goVJ8TapsWwqcTLfVdgLL\nJM1KRQrLgJ3p3ClJS9NYa4ZdayxjmJlZBo0+WWEpcCfwbmAGMA34QURcXKfrlcBHgAOS9qfYpyiq\n77ZKWgu8ANyQzm0HrgP6gFeBGwEi4rikO4C9qd2nhwoXgJuA+4CLKIoUHk7xMY1hZmZ5qCgeq9NI\nqgKrgP8LVChWHh0R8anWTu/8UKlUolqt5p6GmdmkImlfRFTqtWv0OyIiok/StIg4C3xR0j+Pa4Zm\nZmY0nohelTQD2C/pTyh+G/RjrZuWmZlNFY3+jugjqe0twA8oqs7+W6smZWZmU0ej7yP6btr9IXD7\n8POSvhIRTkxmZjZmja6I6nnnObqOmZlNMecqEfnJA2Zm1pRzlYjMzMyacq4S0YgPjzMzMxvNuUpE\nt56j65iZ2RQzatWcpAPU/v5HQETEz1Hs/H0L5mZmZlNAvfLtX5yQWZiZ2ZQ1aiIq/X7IzMysJRr6\njii9ZmGvpO9LOiPprKTvtXpyZmb2o6/RYoW/BFYDhyhet/BbFK+FMDMzGxc/fdvMzLLy07fNzCyr\n8Tx9+1fqdZJ0r6Sjkp4sxf5Q0r9K2p+260rnbpPUJ+kZSdeW4p0p1idpXSm+UNIeSYckPZCSJZIu\nTMd96fyCemOYmVkejSailRHxw4j4XkTcHhGfoLHS7vuAzhrxDRGxOG3bASQtongL7HtSny9ImiZp\nGnAXsBxYBKxObQE+l67VAZwA1qb4WuBERLwL2JDajThGg/8NzMysBRpNRF01Yr9Rr1NE/CNwvMEx\nVgBbIuJ0RDwH9AFL0tYXEc9GxBlgC7BCkoCrgQdT/x5gZelaPWn/QeCDqf1IY5iZWSb1nqywGvgf\nwEJJ20qnLgaOjWPcWyStAarAJyPiBDAPeKzUpj/FAA4Pi18BzAZeiYjBGu3nDfWJiEFJJ1P70cZ4\nA0ndQDfA5Zdf3sSfaGZmjahXrPDPFIUJc4DPl+KngCeaHPNu4A6KRwfdka77m9R+cGpQe9UWo7Rn\nlHOj9XljMGITsAmgUqn4NRdmZi0y6q25iPhuROyOiF8AvgO8PW39pZXImETESxFxNiJeA/6a12+N\n9VMUQQxpB46MEn8ZmClp+rD4G66Vzr+D4hbhSNcyM7NMGn2ywg3A48ANwK8CeyRd38yAkuaWDj8M\nDFXUbQNWpYq3hUBHGnMv0JEq5GZQFBtsi4gAHgWG5tEFPFS61tD3WtcDj6T2I41hZmaZNPo7ov8N\nfCAijgJIagO+weuFAjVJ+jJwFTBHUj+wHrhK0mKKW2LPA78DEBEHJW0FngIGgZvTj2eRdAuwE5gG\n3BsRB9MQtwJbJH0G+BZwT4rfA3xJUh/FSmhVvTHMzCwPFQuFOo2kAxHxs6XjtwDfLsd+lFUqlahW\nq7mnYWY2qUjaFxGVeu0aXRE9LGkn8OV0/N+B7c1OzszMbEijvyMK4K+AnwPeS6omMzMzG69GV0Qf\niohbga8OBSTdjl8RbmZm41TvB603Af8LeKek8u+G3g78UysnZmZmU0O9FdH9wMPAHwPrSvFTEdHo\no3vMzMxGVO9V4SeBkxQvxTMzMzvnGi1WMDMzawknIjMzy8qJyMzMsnIiMjOzrJyIzMwsKyciMzPL\nyonIzMyyciIyM7OsnIjMzCwrJyIzM8uqpYlI0r2Sjkp6shS7RFKvpEPpc1aKS9JGSX2SnpD0vlKf\nrtT+kKSuUvz9kg6kPhslqdkxzMwsj1aviO4DOofF1gG7IqID2MXrD1NdDnSkrRu4G4qkQvGK8SuA\nJcD6ocSS2nSX+nU2M4aZmeXT0kQUEf8IDH9K9wqgJ+33ACtL8c1ReAyYKWkucC3QGxHHI+IE0At0\npnMXR8Q3o3jf+eZh1xrLGGZmlkmO74gui4gXAdLnpSk+DzhcatefYqPF+2vEmxnjTSR1S6pKqg4M\nDIzpDzQzs8adT8UKqhGLJuLNjPHmYMSmiKhERKWtra3OZc3MrFk5EtFLQ7fD0ufRFO8H5pfatQNH\n6sTba8SbGcPMzDLJkYi2AUOVb13AQ6X4mlTZthQ4mW6r7QSWSZqVihSWATvTuVOSlqZquTXDrjWW\nMczMLJN6rwofF0lfBq4C5kjqp6h++yywVdJa4AXghtR8O3Ad0Ae8CtwIEBHHJd0B7E3tPl16TflN\nFJV5F1G80vzhFB/TGGZmlo+KgjMbTaVSiWq1mnsaZmaTiqR9EVGp1+58KlYwM7MpyInIzMyyciIy\nM7OsnIjMzCwrJyIzM8vKicjMzLJyIjIzs6yciMzMLCsnIjMzy8qJyMzMsnIiMjOzrJyIzMwsKyci\nMzPLyonIzMyyciIyM7OsnIjMzCyrbIlI0vOSDkjaL6maYpdI6pV0KH3OSnFJ2iipT9ITkt5Xuk5X\nan9IUlcp/v50/b7UV6ONYWZmeeReEf2XiFhceoPfOmBXRHQAu9IxwHKgI23dwN1QJBWK149fASwB\n1pcSy92p7VC/zjpjmJlZBrkT0XArgJ603wOsLMU3R+ExYKakucC1QG9EHI+IE0Av0JnOXRwR34zi\nXeibh12r1hhmZpZBzkQUwN9L2iepO8Uui4gXAdLnpSk+Dzhc6tufYqPF+2vERxvjDSR1S6pKqg4M\nDDT5J5qZWT3TM459ZUQckXQp0CvpO6O0VY1YNBFvWERsAjYBVCqVMfU1M7PGZVsRRcSR9HkU+BrF\ndzwvpdtqpM+jqXk/ML/UvR04UifeXiPOKGOYmVkGWRKRpB+T9PahfWAZ8CSwDRiqfOsCHkr724A1\nqXpuKXAy3VbbCSyTNCsVKSwDdqZzpyQtTdVya4Zdq9YYZmaWQa5bc5cBX0sV1dOB+yNih6S9wFZJ\na4EXgBtS++3AdUAf8CpwI0BEHJd0B7A3tft0RBxP+zcB9wEXAQ+nDeCzI4xhZmYZqCgqs9FUKpWo\nVqu5p2FmNqlI2lf6ec6IzrfybTMzm2KciMzMLCsnIjMzy8qJyMzMsnIiMjOzrJyIzMwsKyciMzPL\nyonIzMyyciIyM7OsnIjMzCwrJyIzM8vKicjMzLJyIjIzs6yciMzMLCsnIjMzyyrXi/HM7By46qqr\n/n1/9+7d2eZhNh5TckUkqVPSM5L6JK3LPR8zs6lsyiUiSdOAu4DlwCJgtaRFeWdlNnbl1VCtY7PJ\nYiremlsC9EXEswCStgArgKeyzqqOO++8kx07duSexnnh1Vdfxa+4r20qJyNJvO1tb8s9jfNCZ2cn\nH/3oR3NPo2FTbkUEzAMOl477U+wNJHVLqkqqDgwMTNjkzMymGk21f1lKugG4NiJ+Kx1/BFgSESP+\n86FSqUS1Wp2oKZo1pNbqxwULdj6RtC8iKvXaTcUVUT8wv3TcDhzJNBczsylvKiaivUCHpIWSZgCr\ngG2Z52Q2ZsNXP14N2WQ15YoVImJQ0i3ATmAacG9EHMw8LTOzKWvKJSKAiNgObM89D7Px8irIfhRM\nxVtzZmZ2HnEiMjOzrJyIzMwsKyciMzPLasr9oLUZkgaA7+aeh9kI5gAv556EWQ0/GRFt9Ro5EZlN\ncpKqjfx63ex85VtzZmaWlRORmZll5URkNvltyj0Bs/Hwd0RmZpaVV0RmZpaVE5GZmWXlRGRmZlk5\nEZmZWVZORGZmltX/B9h513R0XE0lAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# box-plot showing outliers in fare\n", "sns.boxplot(y=\"total_amount\", data = frame_with_durations_modified)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observations:**\n", "* The highest fare amount is approximately /$4million ;-P\n", "* So we move to percentiles again." ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Sort the total_amount column\n", "var = frame_with_durations_modified[\"total_amount\"].values\n", "var = np.sort(var)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 percentile value is -242.55\n", "10 percentile value is 6.3\n", "20 percentile value is 7.8\n", "30 percentile value is 8.8\n", "40 percentile value is 9.8\n", "50 percentile value is 11.16\n", "60 percentile value is 12.8\n", "70 percentile value is 14.8\n", "80 percentile value is 18.3\n", "90 percentile value is 25.8\n", "100 percentile value is 3950611.6\n" ] } ], "source": [ "# calculating total fare amount values at each percntile 0,10,20,30,40,50,60,70,80,90,100 \n", "for i in range(0,100,10):\n", " print(\"{} percentile value is {}\".format(i,var[int(len(var)*(i/100))]))\n", "print(\"100 percentile value is \",var[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Observations:** There are some negative fare amounts and some really huge amounts which both must be discarded\n", "\n", "Zooming i from 90th percentile to 100th percentile" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "90 percentile value is 25.8\n", "91 percentile value is 27.3\n", "92 percentile value is 29.3\n", "93 percentile value is 31.8\n", "94 percentile value is 34.8\n", "95 percentile value is 38.53\n", "96 percentile value is 42.6\n", "97 percentile value is 48.13\n", "98 percentile value is 58.13\n", "99 percentile value is 66.13\n", "100 percentile value is 3950611.6\n" ] } ], "source": [ "# calculating total fare amount values at each percntile 90,91,92,93,94,95,96,97,98,99,100\n", "for i in range(90,100):\n", " print(\"{} percentile value is {}\".format(i,var[int(len(var)*(i/100))]))\n", "print(\"100 percentile value is \",var[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Zooming in further from 99th percentile to 100th" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "99.0 percentile value is 66.13\n", "99.1 percentile value is 68.13\n", "99.2 percentile value is 69.6\n", "99.3 percentile value is 69.6\n", "99.4 percentile value is 69.73\n", "99.5 percentile value is 69.75\n", "99.6 percentile value is 69.76\n", "99.7 percentile value is 72.58\n", "99.8 percentile value is 75.35\n", "99.9 percentile value is 88.28\n", "100 percentile value is 3950611.6\n" ] } ], "source": [ "# calculating total fare amount values at each percntile 99.0,99.1,99.2,99.3,99.4,99.5,99.6,99.7,99.8,99.9,100\n", "for i in np.arange(0.0, 1.0, 0.1):\n", " print(\"{} percentile value is {}\".format(99+i,var[int(len(var)*(float(99+i)/100))]))\n", "print(\"100 percentile value is \",var[-1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Observation:- As even the 99.9th percentile value doesnt look like an outlier, as there is not much difference between the 99.8th percentile and 99.9th percentile, let's do some more graphical analyis.\n", "\n", "Below plot shows us the last few fare values(sorted) to find a sharp increase to remove those values as outliers" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAEJCAYAAACdePCvAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAFtZJREFUeJzt3X2QZXWd3/H3p3seGBVhkFZxZnDQDEY0Ea0JkrhxUXYB\nSdWOW6WpoVwlhmR2N7i1W9n8gbt/aDRUmVSUilVIMqtTorWKrLphsjVZw6KW+wQyKCIDKiOg9A4L\noyDCIgPd/c0f9/TM7Z5+uN3TfW/PnPer5s4953d+55zv6Xv7fvo83HtTVUiS2mdo0AVIkgbDAJCk\nljIAJKmlDABJaikDQJJaygCQpJYyACSppQwASWopA0CSWmrVoAuYy+mnn16bN28edBmSdFy54447\nflJVI/P1W9EBsHnzZvbu3TvoMiTpuJLkR7308xCQJLWUASBJLWUASFJLGQCS1FIGgCS11LwBkOSk\nJN9M8p0k+5L856b9rCS3JbkvyReSrGna1zbj+5vpm7uW9f6m/ftJLl6ujZIkza+XPYBDwFur6nXA\nucAlSc4H/itwTVVtAR4Hrmj6XwE8XlX/CLim6UeSc4DtwGuAS4BPJBleyo2RJPVu3gCojqea0dXN\nrYC3Al9s2q8H3t4Mb2vGaaZfmCRN+w1VdaiqHgD2A+ctyVZI0gnkS3eM8rnbfrzs6+npHECS4SR3\nAo8CNwM/BH5WVWNNl1FgQzO8AXgIoJn+BPCi7vYZ5pEkNW76zgFu3PvQ/B2PUU8BUFXjVXUusJHO\nX+2vnqlbc59Zps3WPkWSHUn2Jtl78ODBXsqTJC3Cgq4CqqqfAV8HzgdOTTL5URIbgQPN8CiwCaCZ\nfgrwWHf7DPN0r2NnVW2tqq0jI/N+lIUkaZF6uQpoJMmpzfA64FeAe4GvAe9oul0O3NQM727GaaZ/\ntaqqad/eXCV0FrAF+OZSbYgkaWF6+TC4M4Drmyt2hoAbq+rPktwD3JDkvwDfBj7V9P8U8Nkk++n8\n5b8doKr2JbkRuAcYA66sqvGl3RxJUq/mDYCqugt4/Qzt9zPDVTxV9QzwzlmWdTVw9cLLlCQtNd8J\nLEktZQBIUksZAJLUUgaAJLWUASBJLWUASFJLGQCS1FIGgCS1lAEgSS1lAEhSSxkAktRSBoAktZQB\nIEktZQBIUksZAJLUUgaAJLWUASBJLWUASFJLGQCS1FIGgCS1lAEgSS01bwAk2ZTka0nuTbIvye82\n7R9M8ndJ7mxul3bN8/4k+5N8P8nFXe2XNG37k1y1PJskSerFqh76jAG/X1XfSnIycEeSm5tp11TV\nf+/unOQcYDvwGuBlwF8kObuZfC3wq8AocHuS3VV1z1JsiCRpYeYNgKp6GHi4GX4yyb3Ahjlm2Qbc\nUFWHgAeS7AfOa6btr6r7AZLc0PQ1ACRpABZ0DiDJZuD1wG1N0/uS3JVkV5L1TdsG4KGu2Uabttna\nJUkD0HMAJHkB8CXg96rq58B1wCuBc+nsIXx0susMs9cc7dPXsyPJ3iR7Dx482Gt5kqQF6ikAkqym\n8+L/x1X1ZYCqeqSqxqtqAvgjjhzmGQU2dc2+ETgwR/sUVbWzqrZW1daRkZGFbo8kqUe9XAUU4FPA\nvVX1sa72M7q6/TpwdzO8G9ieZG2Ss4AtwDeB24EtSc5KsobOieLdS7MZkqSF6uUqoDcB7wa+m+TO\npu0PgMuSnEvnMM6DwG8CVNW+JDfSObk7BlxZVeMASd4HfAUYBnZV1b4l3BZJ0gL0chXQXzHz8fs9\nc8xzNXD1DO175ppPktQ/vhNYklrKAJCkljIAJKmlDABJaikDQJJaygCQpJYyACSppQwASWopA0CS\nWsoAkKSWMgAkqaUMAElqKQNAklrKAJCkljIAJKmlDABJaikDQJJaygCQpJYyACSppQwASWopA0CS\nWsoAkKSWmjcAkmxK8rUk9ybZl+R3m/bTktyc5L7mfn3TniQfT7I/yV1J3tC1rMub/vcluXz5NkuS\nNJ9e9gDGgN+vqlcD5wNXJjkHuAq4paq2ALc04wBvA7Y0tx3AddAJDOADwBuB84APTIaGJKn/5g2A\nqnq4qr7VDD8J3AtsALYB1zfdrgfe3gxvAz5THbcCpyY5A7gYuLmqHquqx4GbgUuWdGskST1b0DmA\nJJuB1wO3AS+pqoehExLAi5tuG4CHumYbbdpma5++jh1J9ibZe/DgwYWUJ0lagJ4DIMkLgC8Bv1dV\nP5+r6wxtNUf71IaqnVW1taq2joyM9FqeJGmBegqAJKvpvPj/cVV9uWl+pDm0Q3P/aNM+Cmzqmn0j\ncGCOdknSAPRyFVCATwH3VtXHuibtBiav5LkcuKmr/T3N1UDnA080h4i+AlyUZH1z8veipk2SNACr\neujzJuDdwHeT3Nm0/QHwEeDGJFcAPwbe2UzbA1wK7AeeBt4LUFWPJfkwcHvT70NV9diSbIUkacHm\nDYCq+itmPn4PcOEM/Qu4cpZl7QJ2LaRASdLy8J3AktRSBoAktZQBIEktZQBIUksZAJLUUgaAJLWU\nASBJLWUASFJLGQCS1FIGgCS1lAEgSS1lAEhSSxkAktRSBoAktZQBIEktZQBIUksZAJLUUgaAJLWU\nASBJLWUASFJLGQCS1FLzBkCSXUkeTXJ3V9sHk/xdkjub26Vd096fZH+S7ye5uKv9kqZtf5Krln5T\nJEkL0csewKeBS2Zov6aqzm1uewCSnANsB17TzPOJJMNJhoFrgbcB5wCXNX0lSQOyar4OVfWNJJt7\nXN424IaqOgQ8kGQ/cF4zbX9V3Q+Q5Iam7z0LrliStCSO5RzA+5Lc1RwiWt+0bQAe6uoz2rTN1i5J\nGpDFBsB1wCuBc4GHgY827Zmhb83RfpQkO5LsTbL34MGDiyxPkjSfRQVAVT1SVeNVNQH8EUcO84wC\nm7q6bgQOzNE+07J3VtXWqto6MjKymPIkST1YVAAkOaNr9NeBySuEdgPbk6xNchawBfgmcDuwJclZ\nSdbQOVG8e/FlS5KO1bwngZN8HrgAOD3JKPAB4IIk59I5jPMg8JsAVbUvyY10Tu6OAVdW1XiznPcB\nXwGGgV1VtW/Jt0aS1LNergK6bIbmT83R/2rg6hna9wB7FlSdJGnZ+E5gSWopA0CSWsoAkKSWMgAk\nqaUMAElqKQNAklrKAJCkljIAJKmlDABJaikDQJJaygCQpJYyACSppQwASWopA0CSWsoAkKSWMgAk\nqaUMAElqKQNAklrKAJCkljIAJKmlDABJaikDQJJaat4ASLIryaNJ7u5qOy3JzUnua+7XN+1J8vEk\n+5PcleQNXfNc3vS/L8nly7M5kqRe9bIH8GngkmltVwG3VNUW4JZmHOBtwJbmtgO4DjqBAXwAeCNw\nHvCBydCQJA3GvAFQVd8AHpvWvA24vhm+Hnh7V/tnquNW4NQkZwAXAzdX1WNV9ThwM0eHiiSpjxZ7\nDuAlVfUwQHP/4qZ9A/BQV7/Rpm229qMk2ZFkb5K9Bw8eXGR5kqT5LPVJ4MzQVnO0H91YtbOqtlbV\n1pGRkSUtTpJ0xGID4JHm0A7N/aNN+yiwqavfRuDAHO2SpAFZbADsBiav5LkcuKmr/T3N1UDnA080\nh4i+AlyUZH1z8veipk2SNCCr5uuQ5PPABcDpSUbpXM3zEeDGJFcAPwbe2XTfA1wK7AeeBt4LUFWP\nJfkwcHvT70NVNf3EsiSpj+YNgKq6bJZJF87Qt4ArZ1nOLmDXgqqTJC0b3wksSS1lAEhSSxkAktRS\nBoAktZQBIEktZQBIUksZAJLUUgaAJLWUASBJLWUASFJLGQCS1FIGgCS1lAEgSS1lAEhSSxkAktRS\nBoAktZQBIEktZQBIUksZAJLUUgaAJLWUASBJLXVMAZDkwSTfTXJnkr1N22lJbk5yX3O/vmlPko8n\n2Z/kriRvWIoNkCQtzlLsAbylqs6tqq3N+FXALVW1BbilGQd4G7Clue0ArluCdUuSFmk5DgFtA65v\nhq8H3t7V/pnquBU4NckZy7B+SVIPjjUACvh/Se5IsqNpe0lVPQzQ3L+4ad8APNQ172jTJkkagFXH\nOP+bqupAkhcDNyf53hx9M0NbHdWpEyQ7AM4888xjLE+SNJtj2gOoqgPN/aPAnwLnAY9MHtpp7h9t\nuo8Cm7pm3wgcmGGZO6tqa1VtHRkZOZbyJElzWHQAJHl+kpMnh4GLgLuB3cDlTbfLgZua4d3Ae5qr\ngc4Hnpg8VCRJ6r9jOQT0EuBPk0wu53NV9edJbgduTHIF8GPgnU3/PcClwH7gaeC9x7BuSTphVR11\ndHxZLDoAqup+4HUztP8UuHCG9gKuXOz6JKlNMtNZ0yXmO4ElaYWpmvmqmaVmAEjSClMU6cMugAEg\nSStMFQx5CEiS2meiivThIJABIEkrTBV9OQlgAEjSCtOn138DQJJWnPIyUElqpaIY8iogSWqfCfcA\nJKmdyquAJKmdCvcAJKmV+vRZcAaAJK00BZ4ElqQ2qioPAUlSG/lpoJLUUn4aqCS1lHsAktRSnTeC\nuQcgSa0zMVF+H4Aktc2hsXEe/Ok/sHH985Z9XYv+UnhJ0rF5+tkxfvrUs3zv75/kjh89zrd+9Dg/\nPPgUh8Ym+Jdnn77s6+97ACS5BPgfwDDwyar6SL9rkKR+mJgonnxmjJ/8wyEOPnmIR37+DKOP/4If\nPvoU3/v7J/nBI08yNtF52+/wUHjthlP4Z5tP44JXjXDB2SPLXl9fAyDJMHAt8KvAKHB7kt1VdU8/\n65CkSVXF2EQxNl48Oz7Bc5O3sc74M8+Nc2hsgmfHJnhmbJxfPNu5Pf3cOE8fGuOpQ2P8/BfP8UTX\n7We/eI4nnn6Ox59+lokZPtbhpS88iVe99GTefPYIrxh5Pi87ZR1bN6/npNXDfd32fu8BnAfsr6r7\nAZLcAGwDWhcAVXX48z6qGT8yPNnetNX0eY+eXoenHVnOlHkXMU9R0+ala96a1nfasmaoebZ1T13v\n9J/LkVoOD8+w/oWu4+g+ne9hPbyMOnp7Z3u8mL7eaf0Wsn3Tf47dtU1U1/Jq6jw1bZymX3XNM2P9\ncywHOn/BTnT3aZZ15GfVGZ78eU1MTJ0OU3+u4xOdZUxuz+S0qePFxMTUbR6vYnyimW8Cxpt+4xNH\n5js8PNHpPzHReWGfnG+8GR6v4rmxic6LftN2LBJ44UmreeG6VZy6bg2nrFvNGaeu45R1qznteWtY\n//w1vOj5azj9BWt56Slredmp63jempVx9L3fVWwAHuoaHwXeuBwr+tD/uYfbHvgpY+PF2MTE4YQf\nbx70qS+4U39Ju8envhBM/0Wa+st81AsBM7+wS8ezoXQuUQzN59V0/jGUkDT3ADnSFjrzDA+Foaa9\nu/9QmDbetazAqqEcNf+q4SHWrgpDTdtwuoaHwvDQEMOhcz802RaGm3lXDYVVw2HV0OTwEKuHw9pV\nQ6we7txWDYeTVg9z0uph1gwPsXb1EM9bM8y61cOsWzPMC9auYt3q4b5csrkc+h0AM/2UprwsJtkB\n7AA488wzF72iz976IMND4ZfPHjnyYDcP9OSTZPIxC0eepE0NXfUcmT65AcmRPjn830zLOXpeuuab\nnH6kb9d8055Q3bVOHZ+6ru5pR+bNUX2P1DDz8ph1npnr7Z7GUbVN3ea56p18YZn+2Eyfn6Pa514H\nmVrPbD+3oUzdvhw1Pvv653osZtyeWZ5bMHWdMPVF8fC0rp/VlOfetPGZfqaEGadPr284mTr9OH2h\n08z6HQCjwKau8Y3Age4OVbUT2AmwdevWRf/NXAX/7pdewX+6+FWLXYQkndD6/T6A24EtSc5KsgbY\nDuxejhVNVH/eSCFJx6u+7gFU1ViS9wFfoXMZ6K6q2rcc6+rXW6kl6XjV91PRVbUH2LPM6wD684UK\nknS8OiE/CmLyqi5f/yVpdidoAEzuAQy4EElawU7oAPAcgCTN7oQMgMk3XHkOQJJmd0IGgIeAJGl+\nJ2gAdO7dA5Ck2Z2gATB5DmDAhUjSCnZCBoDnACRpfidoALgHIEnzOSEDwHMAkjS/EzIAVg2Hf/VP\nzuDlL1r+L1WWpOPVyvhamiX2wpNWc+273jDoMiRpRTsh9wAkSfMzACSppQwASWopA0CSWsoAkKSW\nMgAkqaUMAElqKQNAkloqk5+bsxIlOQj86BgWcTrwkyUqZxCsf7Csf7Csf/FeXlUj83Va0QFwrJLs\nraqtg65jsax/sKx/sKx/+XkISJJaygCQpJY60QNg56ALOEbWP1jWP1jWv8xO6HMAkqTZneh7AJKk\nWRz3AZDkkiTfT7I/yVUzTF+b5AvN9NuSbO5/lbProf7/mOSeJHcluSXJywdR51zm24aufu9IUklW\n1JURvdSf5F83j8O+JJ/rd41z6eE5dGaSryX5dvM8unQQdc4kya4kjya5e5bpSfLxZtvuSrKivuij\nh/rf1dR9V5K/SfK6ftc4p6o6bm/AMPBD4BXAGuA7wDnT+vwH4H82w9uBLwy67gXW/xbgec3wb6+k\n+nvdhqbfycA3gFuBrYOue4GPwRbg28D6ZvzFg657gfXvBH67GT4HeHDQdXfV9mbgDcDds0y/FPi/\nQIDzgdsGXfMC6/8XXc+bt620+o/3PYDzgP1VdX9VPQvcAGyb1mcbcH0z/EXgwmTFfFnwvPVX1deq\n6ulm9FZgY59rnE8vjwHAh4H/BjzTz+J60Ev9/x64tqoeB6iqR/tc41x6qb+AFzbDpwAH+ljfnKrq\nG8Bjc3TZBnymOm4FTk1yRn+qm9989VfV30w+b1iBv7/HewBsAB7qGh9t2mbsU1VjwBPAi/pS3fx6\nqb/bFXT+GlpJ5t2GJK8HNlXVn/WzsB718hicDZyd5K+T3Jrkkr5VN79e6v8g8BtJRoE9wO/0p7Ql\nsdDfkZVsxf3+Hu/fCTzTX/LTL2vqpc+g9Fxbkt8AtgK/vKwVLdyc25BkCLgG+Df9KmiBenkMVtE5\nDHQBnb/g/jLJa6vqZ8tcWy96qf8y4NNV9dEk/xz4bFP/xPKXd8xW8u9vz5K8hU4A/NKga+l2vO8B\njAKbusY3cvTu7eE+SVbR2QWea5ezn3qpnyS/Avwh8GtVdahPtfVqvm04GXgt8PUkD9I5jrt7BZ0I\n7vU5dFNVPVdVDwDfpxMIK0Ev9V8B3AhQVX8LnETnc2qOBz39jqxkSf4p8ElgW1X9dND1dDveA+B2\nYEuSs5KsoXOSd/e0PruBy5vhdwBfreaMzAowb/3N4ZP/RefFfyUde5405zZU1RNVdXpVba6qzXSO\ng/5aVe0dTLlH6eU59L/pnIwnyel0Dgnd39cqZ9dL/T8GLgRI8mo6AXCwr1Uu3m7gPc3VQOcDT1TV\nw4MuqldJzgS+DLy7qn4w6HqOMuiz0Md6o3OVwA/oXAnxh03bh+i8yEDnyf4nwH7gm8ArBl3zAuv/\nC+AR4M7mtnvQNS90G6b1/Tor6CqgHh+DAB8D7gG+C2wfdM0LrP8c4K/pXCF0J3DRoGvuqv3zwMPA\nc3T+2r8C+C3gt7p+9tc22/bdFfjcma/+TwKPd/3+7h10zd033wksSS11vB8CkiQtkgEgSS1lAEhS\nSxkAktRSBoAkrRDzfbjctL7XJLmzuf0gyYLfmOhVQJK0QiR5M/AUnc8/eu0C5vsd4PVV9W8Xsj73\nACRphagZPlwuySuT/HmSO5L8ZZJ/PMOsl9F5T8KCHO+fBSRJJ7qddN5Ydl+SNwKfAN46ObH5jpCz\ngK8udMEGgCStUEleQOc7Bf6k61Ps107rth34YlWNL3T5BoAkrVxDwM+q6tw5+mwHrlzswiVJK1BV\n/Rx4IMk74fBXZB7+WskkrwLWA3+7mOUbAJK0QiT5PJ0X81clGU1yBfAu4Iok3wH2MfUb3y4DbqhF\nXs7pZaCS1FLuAUhSSxkAktRSBoAktZQBIEktZQBIUksZAJLUUgaAJLWUASBJLfX/AWs3F7YcXECM\nAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# plot the fare amount only for the last two highest values\n", "plt.plot(var[:-2])\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZMAAAD8CAYAAACyyUlaAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8VOXZ//HPJbLLTkBkEdC4gAtCCqhttdhq0FZsqy1W\nKrVYFKFqF1usT2ur9qltH6W1Kj62+hMFWcQNLZbiVlsXNAn7HgEhsgXDKrIkXL8/5k6fMQ1kJtuZ\nyXzfr9e8cuY69zn3xWHgyjn3mfuYuyMiIlITR0WdgIiIpD8VExERqTEVExERqTEVExERqTEVExER\nqTEVExERqTEVExERqTEVExERqTEVExERqbGjo06gvnTs2NF79uwZdRoiImklPz9/m7tnVdUuY4pJ\nz549ycvLizoNEZG0YmYfJNJOl7lERKTGVExERKTGEi4mZtbIzOab2YvhfS8zm2dmq81supk1CfGm\n4X1hWN8zbh+3hvhKM7soLp4bYoVmNj4unnQfIiJS/5I5M7kJWB73/rfABHfPBrYDo0J8FLDd3U8E\nJoR2mFkfYDjQF8gFHgwFqhHwADAU6ANcGdom3YeIiEQjoWJiZt2AS4C/hPcGDAFmhiaTgMvC8rDw\nnrD+gtB+GDDN3fe7+1qgEBgYXoXuvsbdDwDTgGHV7ENERCKQ6JnJH4CfAIfC+w7ADncvDe+LgK5h\nuSuwASCs3xna/zteYZvDxavTh4iIRKDKYmJmXwa2unt+fLiSpl7FutqKV9X/v5nZaDPLM7O84uLi\nSjYREZHakMiZybnApWa2jtglqCHEzlTamln591S6ARvDchHQHSCsbwOUxMcrbHO4+LZq9PEp7v6w\nu+e4e05WVpXfuRERaXD++PJqlm7cWef9VFlM3P1Wd+/m7j2JDaC/6u5XAa8Bl4dmI4Hnw/Ks8J6w\n/lWPPWh+FjA83InVC8gG3gXeA7LDnVtNQh+zwjbJ9iEiIsHT+UVMeHkVLy3eXOd91eQb8D8FppnZ\nXcB84JEQfwR4wswKiZ0tDAdw96VmNgNYBpQCY929DMDMxgFzgEbAo+6+tDp9iIhIzIrNu7jtucWc\n3bsDN38xu877s0z5hT4nJ8c1nYqIZIJd+w4y7P43+Xh/KX+98XNktWpa7X2ZWb6751TVLmPm5hIR\nyQTuzk+eWsT6kr1M/d7gGhWSZGg6FRGRBuSRf63lb0s3Mz73FAb2al9v/aqYiIg0EO+uLeE3L60g\nt++xXPu5XvXat4qJiEgDsHX3PsY9WUCP9i343RVnUN+TgqiYiIikudKyQ9w4dT679h1k4oj+tG7W\nuN5z0AC8iEiau2fuKt5ZU8I9V5zJKce2jiQHnZmIiKSxucu2MPH197lyYA++PqBbZHmomIiIpKn1\nH+3lhzMWcFrX1tz+lT5Vb1CHVExERNLQvoNljJmSz1FmTLxqAM0aN4o0H42ZiIikodufX8rSjbt4\n9Ds5dG/fIup0dGYiIpJuZuRtYHreBsZ94USGnNI56nQAFRMRkbSydONOfv7cEs49sQM/+NJJUafz\nbyomIiJpYucnB7lhSgHtWjThj8PPotFRqfO0co2ZiIikAXfnlqcW8uH2T5h+3WA6HlM/EzgmSmcm\nIiJp4OE31vD3ZVu49eJTGXB8/U3gmCgVExGRFDdvzUf8bs5KLjm9C989t2fU6VSqymJiZs3M7F0z\nW2hmS83sVyH+mJmtNbMF4dUvxM3M7jOzQjNbZGb94/Y10sxWh9fIuPgAM1sctrnPwgxlZtbezOaG\n9nPNrF1VfYiINCRbd+1j3NT5HN++BXd//fR6n8AxUYmcmewHhrj7mUA/INfMBod1t7h7v/BaEGJD\niT3fPRsYDUyEWGEAbgcGAQOB28uLQ2gzOm673BAfD7zi7tnAK+H9YfsQEWlISssOMW7qfPbsK2Xi\niAG0imACx0RVWUw8Zk942zi8jvSs32HA42G7d4C2ZtYFuAiY6+4l7r4dmEusMHUBWrv72x57hvDj\nwGVx+5oUlidViFfWh4hIg/H7OStjzyj52umcfGyrqNM5ooTGTMyskZktALYSKwjzwqpfh8tME8ys\n/NaCrsCGuM2LQuxI8aJK4gCd3X0TQPjZqYo+REQahDlLN/O/b6xhxOAeXHZW6v/3llAxcfcyd+8H\ndAMGmtlpwK3AKcBngPbAT0Pzyi7oeTXiR5LQNmY22szyzCyvuLi4il2KiKSGdds+5sczFnJmtzb8\n/MvRTuCYqKTu5nL3HcDrQK67bwqXmfYD/4/YOAjEzhK6x23WDdhYRbxbJXGALeWXr8LPrVX0UTHf\nh909x91zsrKykvmjiohEIjaBYwGNGhkPXNWfpkdHO4FjohK5myvLzNqG5ebAF4EVcf/JG7GxjCVh\nk1nA1eGOq8HAznCJag5woZm1CwPvFwJzwrrdZjY47Otq4Pm4fZXf9TWyQryyPkRE0trPn1vCis27\nmPDNfnRrF/0EjolK5BvwXYBJZtaIWPGZ4e4vmtmrZpZF7JLTAuD60H42cDFQCOwFrgFw9xIzuxN4\nL7S7w91LwvIY4DGgOfBSeAHcDcwws1HAeuCKI/UhIpLOpr+3nqfyi7hxyIl84eROVW+QQix2A1XD\nl5OT43l5eVGnISJSqSUf7uRrE99iUK/2PHbNwJSZd8vM8t09p6p2+ga8iEjEdu49yJgp+XRo2YQ/\nfLNfyhSSZGiiRxGRCB065PzoqQVs3rmP6dedTYcUm8AxUTozERGJ0ENvvM/Ly7dy28Wn0r9Hu6o3\nSFEqJiIiEXnr/W38z5yVfOXM4xh5Ts+o06kRFRMRkQhs2bWPG6fOp1fHltz9tdSdwDFRGjMREaln\nB8sOMe7JAvYeKGPq9wbTsmn6/1ec/n8CEZE087u/reC9ddv54/B+ZHdO7QkcE6XLXCIi9eilxZv4\n8z/XcvXZxzOsX+pP4JgoFRMRkXqypngPt8xcRL/ubbntklOjTqdWqZiIiNSDTw6UccOUAhqn2QSO\nidKYiYhIHXN3bntuMSu37GbSNQPp2rZ51CnVOp2ZiIjUsanvbuCZgg+56YJsPn9Sw3wchoqJiEgd\nWly0k1/OWsrnT8rixiHZUadTZ1RMRETqyI69BxgzJZ+Ox8QmcDwqDSdwTJTGTERE6sChQ84Ppi9g\ny659PHX9ObRv2STqlOqUzkxEROrAg68X8trKYn7+5T7069426nTqXCKP7W1mZu+a2UIzW2pmvwrx\nXmY2z8xWm9l0M2sS4k3D+8Kwvmfcvm4N8ZVmdlFcPDfECs1sfFw86T5ERKL2ZuE27p27imH9juPb\ng4+POp16kciZyX5giLufCfQDcsNz138LTHD3bGA7MCq0HwVsd/cTgQmhHWbWBxgO9AVygQfNrFF4\nHPADwFCgD3BlaEuyfYiIRG3zztgEjidkHcNvGsAEjomqsph4zJ7wtnF4OTAEmBnik4DLwvKw8J6w\n/gKLHc1hwDR33+/ua4k9v31geBW6+xp3PwBMA4aFbZLtQ0QkMgfLDjH2yQL2HSxj4ogBtGiSOcPS\nCY2ZhDOIBcBWYC7wPrDD3UtDkyKgfJKZrsAGgLB+J9AhPl5hm8PFO1SjDxGRyPxm9gryP9jOby8/\ngxM7HRN1OvUqoWLi7mXu3g/oRuxMorJJZTz8rOwMwWsxfqQ+PsXMRptZnpnlFRcXV7KJiEjt+Oui\nTTz65lq+c05PvnzGcVGnU++SupvL3XcArwODgbZmVn4O1w3YGJaLgO4AYX0boCQ+XmGbw8W3VaOP\nivk+7O457p6TldUwv3UqItF7v3gPP5m5kP492vKzixvWBI6JSuRuriwzaxuWmwNfBJYDrwGXh2Yj\ngefD8qzwnrD+VXf3EB8e7sTqBWQD7wLvAdnhzq0mxAbpZ4Vtku1DRKRe7T1QypjJ+TRt3IgHrupP\nk6Mz8xsXiYwOdQEmhbuujgJmuPuLZrYMmGZmdwHzgUdC+0eAJ8yskNjZwnAAd19qZjOAZUApMNbd\nywDMbBwwB2gEPOruS8O+fppMHyIi9cnd+dkzi1m9dQ9PfHcQXdo0vAkcE2WZ8gt9Tk6O5+XlRZ2G\niDQgT7zzAT9/bgk/+tJJfP+Chjnvlpnlu3tOVe0y83xMRKSGFm7YwZ0vLOMLJ2cx9gsnRp1O5FRM\nRESStP3jA9wwpYCsVk2Z0MAncExU5nyjRkSkFhw65PxgxgKKd+9n5pizaduiYU/gmCidmYiIJOH+\n1wp5fWUxv/hKH87o1vAncEyUiomISILeWFXMhJdX8dWzunLVoB5Rp5NSVExERBKwcccn3DRtPtmd\njuHXXz0tYyZwTJSKiYhIFQ6UHuKGKQUcLPOMm8AxUToiIiJV+O/Zy1mwYQcPXtWfE7IyawLHROnM\nRETkCGYt3Mhjb61j1Gd7cfHpXaJOJ2WpmIiIHEbh1t2Mf3oROce3Y/zQU6JOJ6WpmIiIVOLj/aVc\nP7mAFk0acf+3+tO4kf67PBKNmYiIVODujH9mMWuK9zB51CCObdMs6pRSnkqtiEgFj7/9AS8s3MiP\nLjyZc07sGHU6aUHFREQkTsH67dz112VccEonxpx3QtTppA0VExGRoOTjA4ybUkDn1s249xuawDEZ\nGjMREQHKDjk3TZvPto8P8MyYc2jTonHUKaWVRB7b293MXjOz5Wa21MxuCvFfmtmHZrYgvC6O2+ZW\nMys0s5VmdlFcPDfECs1sfFy8l5nNM7PVZjY9PL6X8Ijf6aH9PDPrWVUfIiLVcd8rq/nn6m386tK+\nnNa1TdTppJ1ELnOVAj9y91OBwcBYM+sT1k1w937hNRsgrBsO9AVygQfNrFF47O8DwFCgD3Bl3H5+\nG/aVDWwHRoX4KGC7u58ITAjtDttHtY+CiGS011du5b5XV/P1/t0Y/pnuUaeTlqosJu6+yd0LwvJu\nYDnQ9QibDAOmuft+d18LFAIDw6vQ3de4+wFgGjDMYrOlDQFmhu0nAZfF7WtSWJ4JXBDaH64PEZGk\nFG3fy83TF3By51bcdZkmcKyupAbgw2Wms4B5ITTOzBaZ2aNm1i7EugIb4jYrCrHDxTsAO9y9tEL8\nU/sK63eG9ofbl4hIwvaXljF2SgFlYQLH5k10gaO6Ei4mZnYM8DRws7vvAiYCJwD9gE3APeVNK9nc\nqxGvzr4q5jzazPLMLK+4uLiSTUQkk9314nIWFu3k91ecQa+OLaNOJ60lVEzMrDGxQjLF3Z8BcPct\n7l7m7oeAP/N/l5mKgPiLjt2AjUeIbwPamtnRFeKf2ldY3wYoOcK+PsXdH3b3HHfPycrKSuSPKiIZ\n4vkFH/LEOx8w+vO9yT1NEzjWVCJ3cxnwCLDc3e+Ni8cf/a8CS8LyLGB4uBOrF5ANvAu8B2SHO7ea\nEBtAn+XuDrwGXB62Hwk8H7evkWH5cuDV0P5wfYiIVGnVlt2Mf3oxA3u25ycXnRx1Og1CIt8zORf4\nNrDYzBaE2M+I3Y3Vj9jlpXXAdQDuvtTMZgDLiN0JNtbdywDMbBwwB2gEPOruS8P+fgpMM7O7gPnE\nihfh5xNmVkjsjGR4VX2IiBzJnv2lXD85n5ZNj+b+b53F0ZrAsVZY7Bf9hi8nJ8fz8vKiTkNEIuTu\njJs6n5cWb2LKtYM5+4QOUaeU8sws391zqmqnkiwiGeOxt9bx10WbuOWiU1RIapmKiYhkhPwPSvj1\nX5fzxVM7c/15vaNOp8FRMRGRBm/bnv2MnTKf49o2555vnKkvJtYBTfQoIg1a+QSOJXvDBI7NNYFj\nXdCZiYg0aH94eRVvFn7EXcNO0wSOdUjFREQarNdWbOVPrxbyjZxufEMTONYpFRMRaZA2lMQmcOzT\npTV3DDst6nQaPBUTEWlw9peWMfbJAg65M3FEf5o11gSOdU0D8CLS4NzxwjIWFe3k4W8P4PgOmsCx\nPujMREQalGcKipgybz3XndebC/seG3U6GUPFREQajBWbd/GzZxczqFd7brlQEzjWJxUTEWkQdu87\nyJjJBbRq1pg/aQLHeqcxExFJe+7OT2YuYn3JXp68dhCdWjWLOqWMo9ItImnvkX+t5aUlm/lp7skM\n6q0JHKOgYiIiaS1vXQl3v7SCi/p25nuf0wSOUVExEZG0Vbx7P2OfLKBbu+b8/gpN4BilRB7b293M\nXjOz5Wa21MxuCvH2ZjbXzFaHn+1C3MzsPjMrNLNFZtY/bl8jQ/vVZjYyLj7AzBaHbe4LjwquVh8i\nkhlKyw5x49T57Nh7kAevGkDrZprAMUqJnJmUAj9y91OBwcBYM+sDjAdecfds4JXwHmAosWeyZwOj\ngYkQKwzA7cAgYCBwe3lxCG1Gx22XG+JJ9SEimePeuat4e81H3HXZafQ5rnXU6WS8KouJu29y94Kw\nvBtYDnQFhgGTQrNJwGVheRjwuMe8A7Q1sy7ARcBcdy9x9+3AXCA3rGvt7m977BnCj1fYVzJ9iEgG\neHnZFh58/X2Gf6Y7V+RoAsdUkNSYiZn1BM4C5gGd3X0TxAoO0Ck06wpsiNusKMSOFC+qJE41+hCR\nBm79R3v54YwFnNa1Nb+8tG/U6UiQcDExs2OAp4Gb3X3XkZpWEvNqxI+YTiLbmNloM8szs7zi4uIq\ndikiqW7fwTJueDIfgIlXDdAEjikkoWJiZo2JFZIp7v5MCG8pv7QUfm4N8SIg/ryzG7Cxini3SuLV\n6eNT3P1hd89x95ysrKxE/qgiksJ+9cJSlny4iwnf7Ef39i2iTkfiJHI3lwGPAMvd/d64VbOA8juy\nRgLPx8WvDndcDQZ2hktUc4ALzaxdGHi/EJgT1u02s8Ghr6sr7CuZPkSkgZqZX8TUdzdww/kncMGp\nnaNORypIZDqVc4FvA4vNbEGI/Qy4G5hhZqOA9cAVYd1s4GKgENgLXAPg7iVmdifwXmh3h7uXhOUx\nwGNAc+Cl8CLZPkSkYVq2cRe3PbuYs3t34IdfOinqdKQSFruBquHLycnxvLy8qNMQkSTt2neQS//0\nL/YeKOOvN36OrFZNo04po5hZvrvnVNVOEz2KSMpyd348YyEbtn/CtNGDVUhSmKZTEZGU9ed/ruHv\ny7Zw69BT+EzP9lGnI0egYiIiKWnemo/47d9WMvS0Yxn12V5RpyNVUDERkZSzdfc+xk2dz/HtW/C7\ny8/QBI5pQGMmIpJSSssO8f0n57N730GeGDWQVprAMS2omIhISvmfv69i3toS7v3GmZxyrCZwTBe6\nzCUiKePvSzfz0D/e51uDevC1/t2q3kBShoqJiKSEDz76mB89tZDTu7bhF1/uE3U6kiQVExGJ3L6D\nZVw/uYCjzHjwqv6awDENacxERCL3i+eXsHzTLh79To4mcExTOjMRkUjNeG8DM/KK+P6QExlyiiZw\nTFcqJiISmaUbd/Lz55fw2RM7cvMXNYFjOlMxEZFI7PzkIGMmF9CuRRP+OLwfjY7SFxPTmcZMRKTe\nHTrk/GjGQjbu+ITp151Nh2M0gWO605mJiNS7/31jDS8v38LPLj6VAce3izodqQUqJiJSr95+/yN+\nP2cFl5zRhWvO7Rl1OlJLEnls76NmttXMlsTFfmlmH5rZgvC6OG7drWZWaGYrzeyiuHhuiBWa2fi4\neC8zm2dmq81supk1CfGm4X1hWN+zqj5EJLVt3bWP70+dT8+OLfnt1zWBY0OSyJnJY0BuJfEJ7t4v\nvGYDmFkfYDjQN2zzoJk1MrNGwAPAUKAPcGVoC/DbsK9sYDswKsRHAdvd/URgQmh32D6S+2OLSH07\nWHaIcU/O5+P9pTw0YgDHNNWQbUNSZTFx9zeAkqraBcOAae6+393XEntG+8DwKnT3Ne5+AJgGDLPY\nryVDgJlh+0nAZXH7mhSWZwIXhPaH60NEUtjv56zk3XUl3P310zmpc6uo05FaVpMxk3FmtihcBisf\nQesKbIhrUxRih4t3AHa4e2mF+Kf2FdbvDO0Pty8RSVF/W7KZh99Yw7cHH8+wfvrn2hBVt5hMBE4A\n+gGbgHtCvLILoF6NeHX29R/MbLSZ5ZlZXnFxcWVNRKSOrd32Mbc8tZAzu7flv758atTpSB2pVjFx\n9y3uXubuh4A/83+XmYqA7nFNuwEbjxDfBrQ1s6MrxD+1r7C+DbHLbYfbV2V5PuzuOe6ek5WVVZ0/\nqojUwCcHyhgzOZ9GjYwHvnUWTY/W8GZDVa1iYmZd4t5+FSi/02sWMDzcidULyAbeBd4DssOdW02I\nDaDPcncHXgMuD9uPBJ6P29fIsHw58Gpof7g+RCSFuDv/9dwSVm7ZzR++2Y9u7TSBY0NW5e0UZjYV\nOB/oaGZFwO3A+WbWj9jlpXXAdQDuvtTMZgDLgFJgrLuXhf2MA+YAjYBH3X1p6OKnwDQzuwuYDzwS\n4o8AT5hZIbEzkuFV9SEiqWPaext4uqCIGy/I5vyTO0WdjtQxi/2y3/Dl5OR4Xl5e1GmIZIQlH+7k\naxPfYlCv9jx2zUDNu5XGzCzf3XOqaqdvwItIrdq59yDXT86nY8sm/HH4WSokGULfGhKRWnPokPPD\nGQvYsmsfM647m/Ytm0SdktQTnZmISK2Z+I/3eWXFVv7rkj6c1UMTOGYSFRMRqRVvFm7jnr+v5Ctn\nHsfVZx8fdTpSz1RMRKTGNu/cx41T59M76xju/trpmsAxA2nMRERqJDaBYwGfHCxj+oj+tNQEjhlJ\nf+siUiN3v7SCvA+2c9+VZ3FiJ03gmKl0mUtEqm324k088q+1jDz7eC4987io05EIqZiISLWsKd7D\nT2Yuol/3ttx2SZ+qN5AGTcVERJK290ApYyYX0OToo3jwqv40OVr/lWQ6jZmISFLcndueXcKqrbt5\n/LsDOa5t86hTkhSgXydEJClT5q3n2fkfcvMFJ/G5bD3aQWJUTEQkYYuKdnDHC8s476Qsvj/kxKjT\nkRSiYiIiCdn+8QHGTC4gq1VT/vDNfhylCRwljsZMRKRKhw45P5ixgK279/HU9efQThM4SgU6MxGR\nKj3wWiGvryzmF1/pS7/ubaNOR1JQlcXEzB41s61mtiQu1t7M5prZ6vCzXYibmd1nZoVmtsjM+sdt\nMzK0X21mI+PiA8xscdjmPguT+lSnDxGpff9avY17X17FZf2OY8SgHlGnIykqkTOTx4DcCrHxwCvu\nng28Et4DDCX2TPZsYDQwEWKFgdjjfgcBA4Hby4tDaDM6brvc6vQhIrVv085PuHHafLI7HcN/awJH\nOYIqi4m7v0HsGezxhgGTwvIk4LK4+OMe8w7Q1sy6ABcBc929xN23A3OB3LCutbu/7bHnBz9eYV/J\n9CEitehA6SFumFLA/oNlTBwxgBZNNMQqh1fdMZPO7r4JIPzsFOJdgQ1x7YpC7Ejxokri1elDRGrR\nf89ezvz1O/jd5WdyQtYxUacjKa62B+ArOwf2asSr08d/NjQbbWZ5ZpZXXFxcxW5FpNwLCzfy2Fvr\nuObcnlxyhk78pWrVLSZbyi8thZ9bQ7wI6B7XrhuwsYp4t0ri1enjP7j7w+6e4+45WVn6pq5IIgq3\n7mH804vo36Mttw49Nep0JE1Ut5jMAsrvyBoJPB8XvzrccTUY2BkuUc0BLjSzdmHg/UJgTli328wG\nh7u4rq6wr2T6EJEa+nh/KWMm59O0cSMe0ASOkoQqR9TMbCpwPtDRzIqI3ZV1NzDDzEYB64ErQvPZ\nwMVAIbAXuAbA3UvM7E7gvdDuDncvH9QfQ+yOsebAS+FFsn2ISM24Oz97djHvF+/hiVGD6NJGEzhK\n4ix2E1XDl5OT43l5eVGnIZKynnh7HT9/fik/vvAkxg3JjjodSRFmlu/uOVW10zmsiLBgww7ueHEZ\nQ07pxA3nawJHSZ6KiUiGK/n4ADdMzqdz62bc+40zNYGjVIu+hSSSwcoOOTdPX8C2PQeYOeZs2rbQ\nBI5SPTozEclgf3p1NW+sKub2S/twRjdN4CjVp2IikqH+saqYP76ymq+d1ZVvDdQEjlIzKiYiGejD\nHZ9w87T5nNSpFb/+qiZwlJpTMRHJMAdKDzF2SgEHy5yJI/rTvEmjqFOSBkAD8CIZ5td/XcaCDTt4\naER/emsCR6klOjMRySDPL/iQSW9/wLWf7UXuaZrAUWqPiolIhli9ZTfjn17MZ3q246dDT4k6HWlg\nVExEMsCe/aVcPzmflk0bcf+3+tO4kf7pS+3SmIlIA+fujH96EWu3fczkawfRuXWzqFOSBki/nog0\ncJPeWseLizbxowtP5pwTOkadjjRQKiYiDVjB+u38evZyvnhqJ8acd0LU6UgDpmIi0kB9tGc/Y6cU\ncGybZtxzRT9N4Ch1SmMmIg1Q+QSOH318gGfGnEObFo2jTkkauBqdmZjZOjNbbGYLzCwvxNqb2Vwz\nWx1+tgtxM7P7zKzQzBaZWf+4/YwM7Veb2ci4+ICw/8KwrR2pDxGJ+ePLq/jn6m3ccWlfTuvaJup0\nJAPUxmWuL7h7v7gncY0HXnH3bOCV8B5gKJAdXqOBiRArDMQeBTwIGAjcHlccJoa25dvlVtGHSMZ7\nbeVW7nu1kMsHdOObn+kedTqSIepizGQYMCksTwIui4s/7jHvAG3NrAtwETDX3UvcfTswF8gN61q7\n+9see7bw4xX2VVkfIhmtaPtefjB9Aacc24o7h52mCRyl3tS0mDjwdzPLN7PRIdbZ3TcBhJ+dQrwr\nsCFu26IQO1K8qJL4kfoQyVj7S8u4YUoBZWXOQyMGaAJHqVc1HYA/1903mlknYK6ZrThC28p+RfJq\nxBMWCtxogB499LwGadjufHEZi4p28tCIAfTs2DLqdCTD1OjMxN03hp9bgWeJjXlsCZeoCD+3huZF\nQPwF3G7Axiri3SqJc4Q+Kub3sLvnuHtOVlZWdf+YIinvufkfMvmd9Vz3+d7knnZs1OlIBqp2MTGz\nlmbWqnwZuBBYAswCyu/IGgk8H5ZnAVeHu7oGAzvDJao5wIVm1i4MvF8IzAnrdpvZ4HAX19UV9lVZ\nHyIZZ+Xm3dz6zGIG9mrPLRedHHU6kqFqcpmrM/BsGOA7GnjS3f9mZu8BM8xsFLAeuCK0nw1cDBQC\ne4FrANy9xMzuBN4L7e5w95KwPAZ4DGgOvBReAHcfpg+RjLJ730HGTM6nZdOjuf/KszhaEzhKRCx2\no1TDl5M5BDypAAAIzklEQVST43l5eVGnIVJr3J2xTxYwZ+kWplw7iMG9O0SdkjRAZpYf99WPw9Kv\nMSJp6tE31zF78WZuuehkFRKJnIqJSBrKW1fCb2Yv50t9OnPd53tHnY6IiolIutm2Zz9jnyyga7vm\n/M8VZ+qLiZISNNGjSBopO+TcNG0+O/Ye5JkbPkOb5prAUVKDiolIGpkwdxVvFn7E7y4/g77HaQJH\nSR26zCWSJl5ZvoX7Xyvkmznd+UaOJnCU1KJiIpIGNpTEJnDs06U1vxrWN+p0RP6DiolIitt3sIwx\nU/Jx4KERA2jWWBM4SurRmIlIivvVC8tY8uEu/nx1Dj06tIg6HZFK6cxEJIU9nV/E1HfXc/15J/Cl\nPp2jTkfksFRMRFLUis27uO25xQzu3Z4fX3hS1OmIHJGKiUgK2rXvIGMmF9C6WWPu0wSOkgY0ZiKS\nYtydnzy1iPUle5n6vcF0atUs6pREqqRfd0RSzF/+uZa/Ld3M+NxTGNirfdTpiCRExUQkhby7toS7\n/7aC3L7Hcu3nekWdjkjCVExEUsTW3fsY92QB3ds153dXnKEJHCWtpHUxMbNcM1tpZoVmNj7qfESq\nq7TsEDdOnc+ufQeZOGIArZtpAkdJL2lbTMysEfAAMBToA1xpZn2izUqkeu6Zu4p31pRw12Wnc2qX\n1lGnI5K0tC0mwECg0N3XuPsBYBowLOKcRJI2d9kWJr7+PlcO7M7lA7pFnY5ItaTzrcFdgQ1x74uA\nQbXdyT9WFXPni8sSbu/uSe0/qdbJ7TrZ5nWae5K7xpPMPun9J3twktp3cjvf9vEBTuvamtu/ogkc\nJX2lczGpbHTyU/+KzWw0MBqgR48e1erkmKZHc3LnVjXPrJaaJzsom+wQbrJjvsnvP/Etkh5+Tjr3\nJI9lEs2T2XPzJo24/rwTNIGjpLV0LiZFQPxDHboBG+MbuPvDwMMAOTk51fpddMDx7RhwfLvq5igi\nkhHSeczkPSDbzHqZWRNgODAr4pxERDJS2p6ZuHupmY0D5gCNgEfdfWnEaYmIZKS0LSYA7j4bmB11\nHiIimS6dL3OJiEiKUDEREZEaUzEREZEaUzEREZEaUzEREZEas2SnfkhXZlYMfFDNzTsC22oxndqS\nqnlB6uamvJKjvJLTEPM63t2zqmqUMcWkJswsz91zos6jolTNC1I3N+WVHOWVnEzOS5e5RESkxlRM\nRESkxlRMEvNw1AkcRqrmBambm/JKjvJKTsbmpTETERGpMZ2ZiIhIjWV8MTGzXDNbaWaFZja+kvVN\nzWx6WD/PzHrGrbs1xFea2UX1nNcPzWyZmS0ys1fM7Pi4dWVmtiC8anVa/gTy+o6ZFcf1f23cupFm\ntjq8RtZzXhPiclplZjvi1tXl8XrUzLaa2ZLDrDczuy/kvcjM+setq8vjVVVeV4V8FpnZW2Z2Zty6\ndWa2OByvvHrO63wz2xn39/WLuHVH/AzUcV63xOW0JHym2od1dXK8zKy7mb1mZsvNbKmZ3VRJm/r7\nfLl7xr6ITV3/PtAbaAIsBPpUaHMD8FBYHg5MD8t9QvumQK+wn0b1mNcXgBZheUx5XuH9ngiP13eA\n+yvZtj2wJvxsF5bb1VdeFdp/n9gjC+r0eIV9fx7oDyw5zPqLgZeIPZxxMDCvro9XgnmdU94fMLQ8\nr/B+HdAxouN1PvBiTT8DtZ1XhbZfAV6t6+MFdAH6h+VWwKpK/j3W2+cr089MBgKF7r7G3Q8A04Bh\nFdoMAyaF5ZnABWZmIT7N3fe7+1qgMOyvXvJy99fcfW94+w6xJ03WtUSO1+FcBMx19xJ33w7MBXIj\nyutKYGot9X1E7v4GUHKEJsOAxz3mHaCtmXWhbo9XlXm5+1uhX6i/z1cix+twavLZrO286uXz5e6b\n3L0gLO8GlgNdKzSrt89XpheTrsCGuPdF/Odfxr/buHspsBPokOC2dZlXvFHEfvso18zM8szsHTO7\nrJZySiavr4dT6plmVv5o5ZQ4XuFyYC/g1bhwXR2vRBwu97o8Xsmq+Ply4O9mlm9moyPI52wzW2hm\nL5lZ3xBLieNlZi2I/af8dFy4zo+XxS6/nwXMq7Cq3j5faf1wrFpglcQq3t52uDaJbFtdCe/bzEYA\nOcB5ceEe7r7RzHoDr5rZYnd/v57yegGY6u77zex6Ymd1QxLcti7zKjccmOnuZXGxujpeiYji85Uw\nM/sCsWLy2bjwueF4dQLmmtmK8Jt7fSggNr3HHjO7GHgOyCZFjhexS1xvunv8WUydHi8zO4ZY8brZ\n3XdVXF3JJnXy+cr0M5MioHvc+27AxsO1MbOjgTbETncT2bYu88LMvgjcBlzq7vvL4+6+MfxcA7xO\n7DeWesnL3T+Ky+XPwIBEt63LvOIMp8IliDo8Xok4XO51ebwSYmZnAH8Bhrn7R+XxuOO1FXiW2ru8\nWyV33+Xue8LybKCxmXUkBY5XcKTPV60fLzNrTKyQTHH3ZyppUn+fr9oeFEqnF7EzszXELnuUD9r1\nrdBmLJ8egJ8Rlvvy6QH4NdTeAHwieZ1FbMAxu0K8HdA0LHcEVlNLA5EJ5tUlbvmrwDthuT2wNuTX\nLiy3r6+8QruTiQ2GWn0cr7g+enL4AeVL+PQA6bt1fbwSzKsHsXHAcyrEWwKt4pbfAnLrMa9jy//+\niP2nvD4cu4Q+A3WVV1hf/otmy/o4XuHP/TjwhyO0qbfPV60d6HR9EbvbYRWx/5hvC7E7iP22D9AM\neCr8w3oX6B237W1hu5XA0HrO62VgC7AgvGaF+DnA4vCPaTEwqp7z+g2wNPT/GnBK3LbfDcexELim\nPvMK738J3F1hu7o+XlOBTcBBYr8NjgKuB64P6w14IOS9GMipp+NVVV5/AbbHfb7yQrx3OFYLw9/z\nbfWc17i4z9c7xBW7yj4D9ZVXaPMdYjflxG9XZ8eL2KVHBxbF/T1dHNXnS9+AFxGRGsv0MRMREakF\nKiYiIlJjKiYiIlJjKiYiIlJjKiYiIlJjKiYiIlJjKiYiIlJjKiYiIlJj/x/x0wBum2L7WAAAAABJ\nRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# plotting last three total fare values\n", "plt.plot(var[-3:])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A very sharp increase in fare values can be seen at the second last value" ] }, { "cell_type": "code", "execution_count": 59, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAD8CAYAAAB+UHOxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHbZJREFUeJzt3XtwXOWd5vHvT91S62pdbNmWJRkZMMZmwAYULgO1RQIL\nDmEDU5NkyMwkTopaMlVki2xla4dkd4pclqpM1U4yM1sZZshAhcwmISRkBidFhfWSkAyZ5WLAGGzF\nWNxsWbIl27q0Lt1St377Rx/ZDci2bEt9Wn2eT1WnT7/9tvo9h/g83e97+n3N3RERkegpC7sBIiIS\nDgWAiEhEKQBERCJKASAiElEKABGRiFIAiIhElAJARCSiFAAiIhGlABARiah42A04mWXLlnlHR0fY\nzRARWVRefPHFw+7efKp6RR0AHR0dbN++PexmiIgsKmb2zlzqqQtIRCSiFAAiIhGlABARiSgFgIhI\nRCkAREQi6pQBYGaVZva8mb1iZrvM7KtB+Roze87M9prZj8ysIihPBI+7g+c78v7Wl4LyPWZ200Lt\nlIiInNpcvgGkgQ+5+0ZgE7DZzK4C/hL4lruvBQaBO4L6dwCD7n4+8K2gHma2AbgduAjYDPydmcXm\nc2dERGTuThkAnjMaPCwPbg58CPhJUP4wcFuwfWvwmOD5683MgvJH3D3t7m8B3cAV87IXIiIl5LEX\ne/jBc/sW/H3mNAZgZjEz2wH0A9uAN4Ahd88EVXqA1mC7FdgPEDw/DCzNL5/lNSIiEvjpyz089lLP\ngr/PnALA3bPuvgloI/epff1s1YJ7O8FzJyp/FzO708y2m9n2gYGBuTRPRKSkJFMZ6ioXfqKG07oK\nyN2HgKeBq4AGM5tpYRvQG2z3AO0AwfP1wNH88llek/8eD7h7p7t3NjefcioLEZGSkwuA8gV/n7lc\nBdRsZg3BdhVwA9AF/Ar4WFBtC/B4sL01eEzw/C/d3YPy24OrhNYAa4Hn52tHRERKRTI1VZBvAHN5\nhxbg4eCKnTLgUXf/uZntBh4xs/8BvAw8GNR/EPgnM+sm98n/dgB332VmjwK7gQxwl7tn53d3REQW\nv2QqQ12iCALA3XcCl85S/iazXMXj7ing4yf4W/cB951+M0VEomEyM006M118YwAiIrKwkqkpgOIY\nAxARkcJJpnJX1+sbgIhIxIymcwFQW4AxAAWAiEgRGVEXkIhINKkLSEQkomYCYIm+AYiIRMvMVUC1\n+gYgIhIto+oCEhGJpmQ6Q2V5GeWxhT89KwBERIpIbh6ghe//BwWAiEhRGSnQPECgABARKSqjBVoL\nABQAIiJFRV1AIiIRVajVwEABICJSVJKpTEHmAQIFgIhIURlNF2Y5SFAAiIgUjey0BwGgbwAiIpEy\nMxW0AkBEJGJm5gEqxERwoAAQESkaxxaD0TcAEZFoKeRaAKAAEBEpGoVcEB4UACIiRUPfAEREImpk\nJgD0QzARkWg5vhiMuoBERCIlmZoiXmZUlhfm1KwAEBEpEjMTwZlZQd5PASAiUiSSqamC/QYA5hAA\nZtZuZr8ysy4z22VmdwflXzGzA2a2I7jdnPeaL5lZt5ntMbOb8so3B2XdZnbPwuySiMjiNJrOUJco\nTP8/wFyiJgN80d1fMrM64EUz2xY89y13/5/5lc1sA3A7cBGwCvi/ZnZB8PS3gX8P9AAvmNlWd989\nHzsiIrLYjRRwLQCYQwC4ex/QF2wnzawLaD3JS24FHnH3NPCWmXUDVwTPdbv7mwBm9khQVwEgIkJu\nDKC1oapg73daYwBm1gFcCjwXFH3ezHaa2UNm1hiUtQL7817WE5SdqFxERJhZDrKIxgBmmFkt8Bjw\nBXcfAe4HzgM2kfuG8FczVWd5uZ+k/L3vc6eZbTez7QMDA3NtnojIolfItQBgjgFgZuXkTv7fd/ef\nArj7IXfPuvs08B2Od/P0AO15L28Dek9S/i7u/oC7d7p7Z3Nz8+nuj4jIouTuBV0PGOZ2FZABDwJd\n7v7NvPKWvGp/ALwWbG8FbjezhJmtAdYCzwMvAGvNbI2ZVZAbKN46P7shIrK4TUxlyU57wX4FDHO7\nCuga4FPAq2a2Iyj7MvBJM9tErhvnbeBzAO6+y8weJTe4mwHucvcsgJl9HngSiAEPufuuedwXEZFF\na2YiuEItCA9zuwroGWbvv3/iJK+5D7hvlvInTvY6EZGoKvRMoKBfAouIFIVCLwcJCgARkaKgbwAi\nIhF1bAxAASAiEi2FXg4SFAAiIkVhNK0uIBGRSJpZDrK2QgEgIhIpydQUtYk4ZWWFWQwGFAAiIkWh\n0NNAgAJARKQojCoARESiKZmeKugVQKAAEBEpCuoCEhGJqGQqU9CJ4EABICJSFHLfANQFJCISOcnU\nFEvUBSQiEi2TmWnSmWmNAYiIRM3MPEAaAxARiZjj8wBpDEBEJFLCWAsAFAAiIqEbCWEqaFAAiIiE\nTt8AREQiSgEgIhJRo+oCEhGJpmPrAesyUBGRaEmmMyTiZVTEC3tKVgCIiIQsmSr8VNCgABARCV0y\nlSn4PECgABARCV0YawGAAkBEJHTJ1BS1CgARkehJpjLUJYpwDMDM2s3sV2bWZWa7zOzuoLzJzLaZ\n2d7gvjEoNzP7WzPrNrOdZnZZ3t/aEtTfa2ZbFm63REQWj9F08XYBZYAvuvt64CrgLjPbANwDPOXu\na4GngscAHwbWBrc7gfshFxjAvcCVwBXAvTOhISISZWGsBgZzCAB373P3l4LtJNAFtAK3Ag8H1R4G\nbgu2bwW+5znPAg1m1gLcBGxz96PuPghsAzbP696IiCwy2WlnNJ0p/jEAM+sALgWeA1a4ex/kQgJY\nHlRrBfbnvawnKDtR+Xvf404z225m2wcGBk6neSIii87MWgBFfRmomdUCjwFfcPeRk1WdpcxPUv7u\nAvcH3L3T3Tubm5vn2jwRkUXp+GIwRRoAZlZO7uT/fXf/aVB8KOjaIbjvD8p7gPa8l7cBvScpFxGJ\nrGRIE8HB3K4CMuBBoMvdv5n31FZg5kqeLcDjeeWfDq4GugoYDrqIngRuNLPGYPD3xqBMRCSywpoI\nDmAu73gN8CngVTPbEZR9GfgG8KiZ3QHsAz4ePPcEcDPQDYwDnwVw96Nm9nXghaDe19z96LzshYjI\nInX8G0ARBoC7P8Ps/fcA189S34G7TvC3HgIeOp0GioiUsuOLwRRhF5CIiCycmQAo6quARERk/ukb\ngIhIRCVTU8TKjMrywp+OFQAiIiGamQo6d8FlYSkARERCFNZEcKAAEBEJVTI1FcpU0KAAEBEJ1Ugq\nnIngQAEgIhKqsNYDBgWAiEioRtNToVwCCgoAEZFQhbUgPCgARERC4+4kU5lQJoIDBYCISGgmprJk\np11dQCIiUTOaCm8xGFAAiIiEZkQBICISTWGuBQAKABGR0IQ5EygoAEREQpNUF5CISDSNpsNbEB4U\nACIioQlzQXhQAIiIhGZEASAiEk3J1BS1iTixssIvBgMKABGR0IyGOA8QKABEREIT5jxAoAAQEQlN\nMj2lbwAiIlGUmwo6nEtAQQEgIhIajQGIiETUiAJARCSakqnwloOEOQSAmT1kZv1m9lpe2VfM7ICZ\n7QhuN+c99yUz6zazPWZ2U1755qCs28zumf9dERFZPCYz06Qz09QV+VVA3wU2z1L+LXffFNyeADCz\nDcDtwEXBa/7OzGJmFgO+DXwY2AB8MqgrIhJJo+lwJ4IDOOU7u/tvzKxjjn/vVuARd08Db5lZN3BF\n8Fy3u78JYGaPBHV3n3aLRURKwPG1AIq4C+gkPm9mO4MuosagrBXYn1enJyg7UbmISCQNjucCYEnV\n4guA+4HzgE1AH/BXQflsE1r4Scrfx8zuNLPtZrZ9YGDgDJsnIlLcXj+YBOC85prQ2nBGAeDuh9w9\n6+7TwHc43s3TA7TnVW0Dek9SPtvffsDdO929s7m5+UyaJyJS9Hb3jVBdEeOcpYssAMysJe/hHwAz\nVwhtBW43s4SZrQHWAs8DLwBrzWyNmVWQGyjeeubNFhFZ3Lr6Rli3si60mUBhDoPAZvZD4DpgmZn1\nAPcC15nZJnLdOG8DnwNw911m9ii5wd0McJe7Z4O/83ngSSAGPOTuu+Z9b0REFgF3Z3ffCP9h46pQ\n2zGXq4A+OUvxgyepfx9w3yzlTwBPnFbrRERK0IGhCZKpDBtaloTaDv0SWESkwLr6cgPA6xUAIiLR\nsrt3BDO4cGVdqO1QAIiIFFhX3wgdS2uoCXEaCFAAiIgUXNfBEda3hPvpHxQAIiIFlUxN8c6R8dAH\ngEEBICJSUHsOFscAMCgAREQKanffCKAAEBGJnK6+ERqqy2mprwy7KQoAEZFC2t2XZP3KJZiFNwXE\nDAWAiEiBZKedPQdHiqL7BxQAIiIF89bhMVJT02xYpQAQEYmUrmMDwOH/BgAUACIiBbO7b4TymLF2\nuQJARCRSuvpGOK+5lop4cZx6i6MVIiIR0NU3UhS/AJ6hABARKYAjo2kOjaSLZgAYFAAiIgVRLGsA\n5FMAiIgUwO6+YUABICISOV19SVYuqaSppiLsphyjABARKYCuvuJYAyCfAkBEZIGlM1m6+0eLqvsH\nFAAiIgtu76FRMtNeVFcAgQJARGTBdRXRGgD5FAAiIgtsd98IVeUxOpbWhN2Ud1EAiIgssK6+Edat\nrCNWFv4aAPkUACIiC8jd6epLFl33DygAREQWVO9wiuGJqaIbAAYFgIjIgurqzQ0Abyiy3wDAHALA\nzB4ys34zey2vrMnMtpnZ3uC+MSg3M/tbM+s2s51mdlnea7YE9fea2ZaF2R0RkeKyO7gCaN3KxfkN\n4LvA5veU3QM85e5rgaeCxwAfBtYGtzuB+yEXGMC9wJXAFcC9M6EhIlKqDo2k+JeXD3Bucw21iXjY\nzXmfUwaAu/8GOPqe4luBh4Pth4Hb8sq/5znPAg1m1gLcBGxz96PuPghs4/2hIiJSMvYdGedjf/9v\nHBpJcd9tF4fdnFmdaSStcPc+AHfvM7PlQXkrsD+vXk9QdqJyEZGSs/dQkj998DlSU9N8/z9exab2\nhrCbNKv5HgSe7SJXP0n5+/+A2Z1mtt3Mtg8MDMxr40REFtqrPcN84h/+H9MOj37u6qI9+cOZB8Ch\noGuH4L4/KO8B2vPqtQG9Jyl/H3d/wN073b2zubn5DJsnIlJ4z791lD/+zrNUV8T58eeuZt3K4rvy\nJ9+ZBsBWYOZKni3A43nlnw6uBroKGA66ip4EbjSzxmDw98agTESkJDy9p59PP/QczUsS/PjPrqZj\nWXFN+zCbU44BmNkPgeuAZWbWQ+5qnm8Aj5rZHcA+4ONB9SeAm4FuYBz4LIC7HzWzrwMvBPW+5u7v\nHVgWEVl0xicz/K9fdvOd37zJBSvq+N4dV7CsNhF2s+bE3Gftii8KnZ2dvn379rCbISLyPu7Ok7sO\n8fWf7+bA0AQfu7yNv7hlA/VV5WE3DTN70d07T1Wv+C5MFREpcm8fHuMrP9vF03sGuHBlHT/+s6v5\nQEdT2M06bQoAEZETcHcms9OkpqZJZ7Kkp6b5yYs93P/rN6iIlfEXt2xgy9XnEI8tzll1FAAiEmnD\n41PsOZRkz6Ekrx9M8vqhJG8MjJJMZUhnpmd9zUc3ruK/fWQ9K5ZUFri180sBICIl68homlcPDNM/\nkubo+CSDY5McnbmNT9I7NMGhkfSx+nWJOGtX1HL9hStoqCknEY+RiJeRiJdRWZ7bPn95LZeuLo2Z\nbBQAIlISRlJTvNYzzCs9w+zsGWJnzzAHhibeVaciXsbSmgoaqytYWlvBNecvY92KOi5YWce6FXW0\n1FdiVlyLtiwkBYCILFpvDozyi10HeXLXIV7ZP3SsfHVTNZeubuAzv9/BxW31tDZUsbS2gqryWKRO\n8KeiABCRRcPd2dU7wpO7DvKL1w6yt38UgI1t9fznGy7g0tUNXNxaT2NNRcgtXRwUACJS1GaWVPzZ\nzl5+vrOX/UcnKDO4cs1S/uTK1dx40UpWNVSF3cxFSQEgIkXprcNjbN3Ry8929tLdP0qszLj2/GX8\npw+u5YYNK2jSp/yzpgAQkaIwPD7FS/sG2f7OUX79+gCvHRjBDK7oaOIzt/0eN1/copP+PFMAiEhB\nZbLTDE1McXRsktcODLP9nUFefHuQ1/uTuEOszLi4tZ7//pH1fOSSFlrq1b2zUBQAIjJv3J2DIyne\n6B/jjYFR3hwYZd/RcY6OTzE0nrv+PpnKvOs1dZVxLlvdyC2XtHB5RyOb2huortCpqRB0lEVkztyd\nwfEpeocm6B2aoG84Re/QBAeGJnj7yBhvDowxPpk9Vr82EeecpdU01VRwTlPuvqG6nMbq3P26lXWs\nXV5HrEyXZoZBASAis5qedt46MsbOniFe2T/MKz1D/K4vycRU9l31KmJltDRUsrqpmk90NnHe8lrO\na67h/OZamusSuu6+iCkAROSYZGqK//3sPp7pHmBnz/Cx7pqq8hgXt9bzRx9op72pmtaGSlrqq1jV\nUMXSmgrK9Al+UVIAiAjJ1BTf/e3b/OMzbzE8McXvtS7hoxtXsbGtgUva6zm/uXbRzngpJ6YAEImw\n9574b1i/gruvX8vFbfVhN00KQAEgEkFHRtP88Pl9fOdfdeKPMgWASESkM1l+2dXPYy8d4Ok9/WSm\nnRvWL+fu6y/QiT+iFAAiJczdeaVnmMde7OFnO3sZGp9ieV2CO65dwx9e3sYFK+rCbqKESAEgUmIG\nxyb5tzeO8Ez3YZ7pHmD/0QkS8TJuumglf3h5G9eev0zX3QugABBZ9EbTGV7eN8hvu4/w2+7DvNY7\njHvuR1hXnbuUu647n5svaWFJZXnYTZUiowAQWUSmstPsOZhkx/4hXtk/xCs9Q+ztH8Ud4mXGZasb\n+cL1F3Dt2mVsbKvXpZtyUgoAkZC5O8l0hsPJNAPJNAOj6ePr1o5NcmTs+Fq2bx0eO7ZQeVNNBRvb\n6vnIxavY2F7PBzqaqEnon7TMnf7fInIGstNOf3JmHpwUR0bTjKUzjKazjKUzudtkhvHJLO6z/42x\nyUzuhJ9MHzupv1d9VTlNNRU01VTQ3lTNNecvY1N7A5vaG2hrrNI0C3JWFAASeTMzWP7uYJI9B5Mc\nHE4x7U5m2pmePn4/mZ2mfyTNgaEJDo2kyEy//8xeHjNqEnFqKuLUJuJUVcROOOBaXRGj85xGmusS\nx2+1lSyry53wG6srKFcXjiwgBYBERmoqS8/gBD2D4+wfnGDvoeSxk/7wxNSxenWJOPGYESsLbmaU\nlRnlsTKa6xJcsaaJVQ2VrGrIzYXT2lDFstoENYkYiXgsxD0UOT0KAClJqaksW3f08uu9A/QMTnBg\ncJzDo5PvqlObiHPBilpuvriF9S11rFtRx7qVdTRUa9UpiQYFgJSUgWSaf3r2Hb7/7DscGZuktaGK\nc5trWL9+BW2NVbQ2VtHWWE1rQxUt9ZXqQ5dIO6sAMLO3gSSQBTLu3mlmTcCPgA7gbeAT7j5ouX9p\nfwPcDIwDn3H3l87m/UVm7O4d4aHfvsXWHb1MZqe5/sLl3HHtGq4+b6lO8iInMB/fAD7o7ofzHt8D\nPOXu3zCze4LHfw58GFgb3K4E7g/uRU6Lu9MzOMGrB4bZ2TPM9rePsv2dQarKY/zRB9r57DUdnNtc\nG3YzRYreQnQB3QpcF2w/DDxNLgBuBb7n7g48a2YNZtbi7n0L0AZZhNydA0MTHB2bZHwyy8RklvHJ\nLOOTGSamshwaSfHqgRFe7RlicDw3aFseM9atrOPPN1/IJ69oV/+9yGk42wBw4P+YmQP/4O4PACtm\nTuru3mdmy4O6rcD+vNf2BGUKgIgaSU2xc/8wO/YP8vK+IXbsH+LI2OQJ68fKjAtW1HHjhpVc3FbP\nJW31rFtZpytvRM7Q2QbANe7eG5zkt5nZ705Sd7aO2PddSG1mdwJ3AqxevfosmycLzd0Zn8zSN5zi\n4HCK3uEJDg6n6BvOLRiemjr+QygP/sfJLSz+xsDosefOX17LBy9czqb2BlYuqaS6IkZVRYzqivix\n7brKuE72IvPorALA3XuD+34z+2fgCuDQTNeOmbUA/UH1HqA97+VtQO8sf/MB4AGAzs7OE/yGUgpl\nLJ3hZ6/08uMXe+gbmmAy60xlp/Nus/8nWlZbwYolldRUxMFy6W+AlYFRRsfSGj66cRWXrm7gkrYG\n6qs0UZlIoZ1xAJhZDVDm7slg+0bga8BWYAvwjeD+8eAlW4HPm9kj5AZ/h9X/X7xeOzDMD57fx+Mv\nH2BsMsva5bX8/vnLqIiXURErozyW+2FUPFZGVXmMlfWJ3CLh9VUsX5Kgslyf1EWK3dl8A1gB/HNw\niV0c+IG7/8LMXgAeNbM7gH3Ax4P6T5C7BLSb3GWgnz2L95Z5NjMA+5vXD/PD5/fx6oFhEvEybrlk\nFX98ZTuXrW7U5ZQiJeaMA8Dd3wQ2zlJ+BLh+lnIH7jrT95P5lT8Au2P/MDv2D3F4NA3AuhV1fPWj\nF3Hbplbqq9U1I1Kq9EvgRWoyM00yNUUylWEyO407TLsz7X5sezIzTX8yTd9wikMjqdx9MFB7YGji\n2ADsuc01/LsLcrNMXn5OIxtalujTvkgEKABCMDGZZSQ1dewEPnMbTecej6QyJFNTjM48l86vl9s+\n0fTBJ5KIl7GyvpKVSyq5/JxGPtHZzqb2Bja2NehTvkhEKQDmSTqTZTSVYSydJZnOnbwPj04G88UH\nt8Hcff7MkydSm4hTV5m71SbiNFZXsLqpmrrK8lz5sefLSZSXUWZGmYGZHduOlRkrluRO+g3V5fpU\nLyLvEokAcHfSmelgoY4s6Uz2+HPvqpfrWkllsqSnpklnsqSC+9F07oR+dCy3WlNuO7dS00w3zInU\nJuK0NlSxqqGSy85poKW+iobqcmoTcZYEJ/Ta4GReVxmntiJOmRbtFpEFVpIBcGQ0ze0PPBus0JRb\nlWm2xTtOlxk0VJWztDZBU00Fa5fX0lRTcfzEncjdaoJP543VFbQ2VrGkMq5P3yJSdEoyAKoqYpzX\nXEtNIk5tIpZboSk4OVdXxKgsj5F/Pra8Hykn4mUkysuoLI/ltuMxKsvLqEnEaagq1yLbIlIySjIA\nqivi/P2nLg+7GSIiRU0fZ0VEIkoBICISUQoAEZGIUgCIiESUAkBEJKIUACIiEaUAEBGJKAWAiEhE\nmXvxrrpoZgPAO2fxJ5YBh+epOYtR1PcfdAxAxwCidwzOcffmU1Uq6gA4W2a23d07w25HWKK+/6Bj\nADoGoGNwIuoCEhGJKAWAiEhElXoAPBB2A0IW9f0HHQPQMQAdg1mV9BiAiIicWKl/AxARkRMoyQAw\ns81mtsfMus3snrDbUwhm9pCZ9ZvZa3llTWa2zcz2BveNYbZxoZlZu5n9ysy6zGyXmd0dlEfiOJhZ\npZk9b2avBPv/1aB8jZk9F+z/j8ysIuy2LjQzi5nZy2b28+Bx5I7BXJRcAJhZDPg28GFgA/BJM9sQ\nbqsK4rvA5veU3QM85e5rgaeCx6UsA3zR3dcDVwF3Bf/to3Ic0sCH3H0jsAnYbGZXAX8JfCvY/0Hg\njhDbWCh3A115j6N4DE6p5AIAuALodvc33X0SeAS4NeQ2LTh3/w1w9D3FtwIPB9sPA7cVtFEF5u59\n7v5SsJ0kdwJoJSLHwXNGg4flwc2BDwE/CcpLdv9nmFkb8BHgH4PHRsSOwVyVYgC0AvvzHvcEZVG0\nwt37IHdyBJaH3J6CMbMO4FLgOSJ0HIKujx1AP7ANeAMYcvdMUCUK/x7+GvivwHTweCnROwZzUooB\nYLOU6VKnCDGzWuAx4AvuPhJ2ewrJ3bPuvgloI/dteP1s1QrbqsIxs1uAfnd/Mb94lqolewxORyku\nCt8DtOc9bgN6Q2pL2A6ZWYu795lZC7lPhSXNzMrJnfy/7+4/DYojdxzcfcjMniY3FtJgZvHgE3Cp\n/3u4Bviomd0MVAJLyH0jiNIxmLNS/AbwArA2GPWvAG4HtobcprBsBbYE21uAx0Nsy4IL+nofBLrc\n/Zt5T0XiOJhZs5k1BNtVwA3kxkF+BXwsqFay+w/g7l9y9zZ37yD3b/+X7v4nROgYnI6S/CFYkP5/\nDcSAh9z9vpCbtODM7IfAdeRmPTwE3Av8C/AosBrYB3zc3d87UFwyzOxa4F+BVzne//tlcuMAJX8c\nzOwScgOcMXIf7h5196+Z2bnkLoZoAl4G/tTd0+G1tDDM7Drgv7j7LVE9BqdSkgEgIiKnVopdQCIi\nMgcKABGRiFIAiIhElAJARCSiFAAiIhGlABARiSgFgIhIRCkAREQi6v8DwcnIv/ZPRFAAAAAASUVO\nRK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# we plot last 50 values excluding last two values\n", "plt.plot(var[-50:-2])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Obsrvations:**\n", "* Now looking at values not including the last two points we again find a drastic increase at around 1000 fare value.\n", "* So we remove all the data points whose far value is greater that 1000 dollars (and less than 0)." ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "collapsed": true }, "outputs": [], "source": [ "frame_with_durations_modified=frame_with_durations[(frame_with_durations.total_amount>0) & (frame_with_durations.total_amount<1000)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calculate the percentage of data points left after removing all the erroneous/ outlierss data points" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Remove all outliers/erronous points." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# removing all outliers based on our univariate analysis above\n", "def remove_outliers(new_frame):\n", "\n", " a = new_frame.shape[0]\n", " print (\"Number of pickup records = \",a)\n", " temp_frame = new_frame[((new_frame.dropoff_longitude >= -74.15) & (new_frame.dropoff_longitude <= -73.7004) &\\\n", " (new_frame.dropoff_latitude >= 40.5774) & (new_frame.dropoff_latitude <= 40.9176)) & \\\n", " ((new_frame.pickup_longitude >= -74.15) & (new_frame.pickup_latitude >= 40.5774)& \\\n", " (new_frame.pickup_longitude <= -73.7004) & (new_frame.pickup_latitude <= 40.9176))]\n", " b = temp_frame.shape[0]\n", " print (\"Number of outlier coordinates lying outside NY boundaries:\",(a-b))\n", "\n", " \n", " temp_frame = new_frame[(new_frame.trip_times > 0) & (new_frame.trip_times < 720)]\n", " c = temp_frame.shape[0]\n", " print (\"Number of outliers from trip times analysis:\",(a-c))\n", " \n", " \n", " temp_frame = new_frame[(new_frame.trip_distance > 0) & (new_frame.trip_distance < 23)]\n", " d = temp_frame.shape[0]\n", " print (\"Number of outliers from trip distance analysis:\",(a-d))\n", " \n", " temp_frame = new_frame[(new_frame.Speed <= 65) & (new_frame.Speed >= 0)]\n", " e = temp_frame.shape[0]\n", " print (\"Number of outliers from speed analysis:\",(a-e))\n", " \n", " temp_frame = new_frame[(new_frame.total_amount <1000) & (new_frame.total_amount >0)]\n", " f = temp_frame.shape[0]\n", " print (\"Number of outliers from fare analysis:\",(a-f))\n", " \n", " \n", " new_frame = new_frame[((new_frame.dropoff_longitude >= -74.15) & (new_frame.dropoff_longitude <= -73.7004) &\\\n", " (new_frame.dropoff_latitude >= 40.5774) & (new_frame.dropoff_latitude <= 40.9176)) & \\\n", " ((new_frame.pickup_longitude >= -74.15) & (new_frame.pickup_latitude >= 40.5774)& \\\n", " (new_frame.pickup_longitude <= -73.7004) & (new_frame.pickup_latitude <= 40.9176))]\n", " \n", " new_frame = new_frame[(new_frame.trip_times > 0) & (new_frame.trip_times < 720)]\n", " new_frame = new_frame[(new_frame.trip_distance > 0) & (new_frame.trip_distance < 23)]\n", " new_frame = new_frame[(new_frame.Speed < 45.31) & (new_frame.Speed > 0)]\n", " new_frame = new_frame[(new_frame.total_amount <1000) & (new_frame.total_amount >0)]\n", " \n", " print (\"Total outliers removed\",a - new_frame.shape[0])\n", " print (\"---\")\n", " return new_frame" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Removing outliers in the month of Jan-2015\n", "----\n", "Number of pickup records = 12748986\n", "Number of outlier coordinates lying outside NY boundaries: 293919\n", "Number of outliers from trip times analysis: 23889\n", "Number of outliers from trip distance analysis: 92597\n", "Number of outliers from speed analysis: 24473\n", "Number of outliers from fare analysis: 5275\n", "Total outliers removed 377910\n", "---\n", "fraction of data points that remain after removing outliers 0.9703576425607495\n" ] } ], "source": [ "print (\"Removing outliers in the month of Jan-2015\")\n", "print (\"----\")\n", "frame_with_durations_outliers_removed = remove_outliers(frame_with_durations)\n", "print(\"fraction of data points that remain after removing outliers\", float(len(frame_with_durations_outliers_removed))/len(frame_with_durations))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Delete the dataframe \"frame_with_durations\" from the main memory by uncommenting and executing the below cell. This helps to keep only relevant data structures in the memorythus avoid unnecessary load on RAM." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# del frame_with_durations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Data-preperation\n", "## Clustering/Segmentation" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[ 40.75011063, -73.99389648],\n", " [ 40.72424316, -74.00164795],\n", " [ 40.80278778, -73.96334076],\n", " [ 40.7138176 , -74.00908661],\n", " [ 40.76242828, -73.97117615],\n", " [ 40.77404785, -73.87437439],\n", " [ 40.72600937, -73.98327637],\n", " [ 40.7341423 , -74.00266266],\n", " [ 40.64435577, -73.78304291],\n", " [ 40.76794815, -73.98558807]])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a dataset containing only pickup_latitude and pickup_longitude of all the data points\n", "# This will be used to find clusters (regions) in the New York city\n", "coords = frame_with_durations_outliers_removed[['pickup_latitude', 'pickup_longitude']].values\n", "coords[:10,:]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Explanation of Haversine distance**\n", "* Lets you have two locations A and B\n", "* Let latitude and longitude of location A be latA and lonA respectively\n", "* Let latitude and longitude of location B be latB and lonB respectively\n", "* Then the haversine distance is defined as follows:\n", "\n", "K = haversine_distance(latA, lonA, latB, lonB)\n", "\n", "where the returned value 'K' is the distance between location A and location B in meters\n", "\n", "Reference for MiniBatchKMeans => http://scikit-learn.org/stable/modules/clustering.html#mini-batch-kmeans" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "On choosing a cluster size of 10 \n", "Avg. Number of Clusters within the vicinity (i.e. intercluster-distance < 2): 1.8 \n", "Avg. Number of Clusters outside the vicinity (i.e. intercluster-distance > 2): 7.2 \n", "Min inter-cluster distance = 1.0933194607372518 \n", "---\n", "On choosing a cluster size of 20 \n", "Avg. Number of Clusters within the vicinity (i.e. intercluster-distance < 2): 3.2 \n", "Avg. Number of Clusters outside the vicinity (i.e. intercluster-distance > 2): 15.8 \n", "Min inter-cluster distance = 0.7123318236197774 \n", "---\n", "On choosing a cluster size of 30 \n", "Avg. Number of Clusters within the vicinity (i.e. intercluster-distance < 2): 7.13 \n", "Avg. Number of Clusters outside the vicinity (i.e. intercluster-distance > 2): 21.87 \n", "Min inter-cluster distance = 0.5179286172497254 \n", "---\n", "On choosing a cluster size of 40 \n", "Avg. Number of Clusters within the vicinity (i.e. intercluster-distance < 2): 8.05 \n", "Avg. Number of Clusters outside the vicinity (i.e. intercluster-distance > 2): 30.95 \n", "Min inter-cluster distance = 0.5064095487015859 \n", "---\n", "On choosing a cluster size of 50 \n", "Avg. Number of Clusters within the vicinity (i.e. intercluster-distance < 2): 11.64 \n", "Avg. Number of Clusters outside the vicinity (i.e. intercluster-distance > 2): 37.36 \n", "Min inter-cluster distance = 0.36495419250817024 \n", "---\n", "On choosing a cluster size of 60 \n", "Avg. Number of Clusters within the vicinity (i.e. intercluster-distance < 2): 13.33 \n", "Avg. Number of Clusters outside the vicinity (i.e. intercluster-distance > 2): 45.67 \n", "Min inter-cluster distance = 0.346654501371586 \n", "---\n", "On choosing a cluster size of 70 \n", "Avg. Number of Clusters within the vicinity (i.e. intercluster-distance < 2): 15.57 \n", "Avg. Number of Clusters outside the vicinity (i.e. intercluster-distance > 2): 53.43 \n", "Min inter-cluster distance = 0.30468071844965394 \n", "---\n", "On choosing a cluster size of 80 \n", "Avg. Number of Clusters within the vicinity (i.e. intercluster-distance < 2): 17.4 \n", "Avg. Number of Clusters outside the vicinity (i.e. intercluster-distance > 2): 61.6 \n", "Min inter-cluster distance = 0.29187627608454664 \n", "---\n", "On choosing a cluster size of 90 \n", "Avg. Number of Clusters within the vicinity (i.e. intercluster-distance < 2): 20.27 \n", "Avg. Number of Clusters outside the vicinity (i.e. intercluster-distance > 2): 68.73 \n", "Min inter-cluster distance = 0.18237562550345013 \n", "---\n" ] } ], "source": [ "# trying different cluster sizes to choose the right K in K-means\n", "\n", "def find_min_distance(cluster_centers, cluster_len):\n", " less2 = [] # less2[i] => No. of clusters within the vicinity of 2 miles from cluster i \n", " more2 = [] # more2[i] => No. of clusters outside the vicinity of 2 miles from cluster i\n", " \n", " min_dist=1000 ## Randomly initialize high value (like infinity)\n", " \n", " for i in range(0, cluster_len): ## i iterates for each cluster\n", " nice_points = 0\n", " wrong_points = 0\n", " \n", " for j in range(0, cluster_len): ## j iterates for each cluster\n", " if j!=i: ## For two separate clusters\n", " ## distance between cluster centers of clusters i and j (inter cluster distance)\n", " distance = gpxpy.geo.haversine_distance(cluster_centers[i][0], cluster_centers[i][1],cluster_centers[j][0], cluster_centers[j][1])\n", " \n", " ## distance is calculaed in meters and is converted to miles below\n", " min_dist = min(min_dist,distance/(1.60934*1000)) ## 1 mile = 1.60934 km\n", " \n", " if (distance/(1.60934*1000)) <= 2:\n", " nice_points +=1\n", " else:\n", " wrong_points += 1\n", " \n", " less2.append(nice_points)\n", " more2.append(wrong_points)\n", " \n", " print (\"On choosing a cluster size of \",cluster_len,\n", " \"\\nAvg. Number of Clusters within the vicinity (i.e. intercluster-distance < 2):\", \\\n", " np.round(sum(less2)/len(less2), 2), \"\\nAvg. Number of Clusters outside the vicinity \\\n", " (i.e. intercluster-distance > 2):\", np.round(sum(more2)/len(more2), 2),\\\n", " \"\\nMin inter-cluster distance = \",\\\n", " min_dist,\"\\n---\")\n", "\n", "def find_clusters(increment):\n", " kmeans = MiniBatchKMeans(n_clusters=increment, batch_size=10000,random_state=42)\n", " kmeans.fit(coords)\n", " cluster_centers = kmeans.cluster_centers_ ## Coordinates of cluster centers\n", " cluster_len = len(cluster_centers) ## No. of clusters => n_clusters\n", " return cluster_centers, cluster_len\n", "\n", "# we need to choose number of clusters so that, there are more number of cluster regions \n", "# that are close to any cluster center\n", "# and make sure that the minimum inter cluster distance should not be very less\n", "for increment in range(10, 100, 10):\n", " cluster_centers, cluster_len = find_clusters(increment)\n", " find_min_distance(cluster_centers, cluster_len) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inference:\n", "- The main objective was to find an optimal min. distance(Which roughly estimates to the radius of a cluster) between the clusters.\n", "- A hueristic for min distance between any two clusters is set as 0.5 miles\n", "- This is achieved when no. of clusters is 40\n", "- If check for the 50 clusters you can observe that there are two clusters with only 0.3 miles apart from each other, which makes the cluster size too small\n", "- So we choose 40 clusters to solve the problem further\n", "- **Note:** This is just a matter of choice which depends on the individual solving the problem. In case to have more precise rules, we can consult domain experts." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Getting 40 clusters using the kmeans \n", "kmeans = MiniBatchKMeans(n_clusters=40, batch_size=10000,random_state=0)\n", "kmeans.fit(coords)\n", "# Predict the closest cluster each sample in dataset belongs to.\n", "frame_with_durations_outliers_removed['pickup_cluster'] = kmeans.predict(frame_with_durations_outliers_removed[['pickup_latitude', 'pickup_longitude']])\n", "cluster_centers = kmeans.cluster_centers_\n", "cluster_len = len(cluster_centers)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 34\n", "1 2\n", "2 16\n", "3 38\n", "4 22\n", "5 3\n", "6 36\n", "7 2\n", "8 5\n", "9 26\n", "Name: pickup_cluster, dtype: int32" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame_with_durations_outliers_removed['pickup_cluster'].head(10)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 -73.993896\n", "1 -74.001648\n", "2 -73.963341\n", "3 -74.009087\n", "4 -73.971176\n", "5 -73.874374\n", "6 -73.983276\n", "7 -74.002663\n", "8 -73.783043\n", "9 -73.985588\n", "Name: pickup_longitude, dtype: float64" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame_with_durations_outliers_removed.pickup_longitude[:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting the cluster centers:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Plotting the cluster centers on OSM\n", "map_osm = folium.Map(location=[40.734695, -73.990372], tiles='Stamen Toner')\n", "for i in range(cluster_len):\n", " folium.Marker(list((cluster_centers[i][0],cluster_centers[i][1])), popup=(str(cluster_centers[i][0])+str(cluster_centers[i][1]))).add_to(map_osm)\n", "map_osm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plotting the clusters:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAacAAAEKCAYAAAC2bZqoAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzsvXmQXdld5/n5nbu9PffUrlKpSlWqfZPtMoV3Y8A2ZRsw\neAljOtxhZmJoDLQb44keT08ETeBuum1mmgbKBrenAdtQ4DHYuIy3wnu5SrWoVKVaVFpSa+6ZL997\n9767nDN/nJeplJTKRVJqvZ+IDL177znn3Sul3vf9lvP7iTGGnJycnJycSwl1sW8gJycnJyfnVHJx\nysnJycm55MjFKScnJyfnkiMXp5ycnJycS45cnHJycnJyLjlyccrJycnJueRYdXESEUdEnhCRL3eO\nrxWRR0TkRRH5goj4C8zxROSzIvK0iOwRkY/Ou3agc/5JEXls3vleEfl6Z92vi0jPaj9bTk5OTs7q\ncCEspw8Be+Ydfxz4hDFmGzAJfGCBOe8EAmPMbcA9wK+JyJZ5119njLnTGLNj3rnfBb7ZWfebneOc\nnJycnMuQVRUnEdkIvAX4dOdYgNcDD3aGfBZ4+wJTDVAWERcoAjFQX+Lt3tZZb7F1c3JycnIuA9xV\nXv+TwO8A1c5xHzBljEk7x4eBDQvMexArNseAEvBbxpiJzjUD/LOIGODPjDEPdM6vMcYcAzDGHBOR\nwYVuSEQ+CHwQoFwu37N9+/Zzeb6cC0QWxyRRiNYajAERlFIo18X1fUAQpRCVh1FzclabnTt3jhlj\nBlbzPVZNnETkrcCIMWaniLx29vQCQxeqn/RyIAPWAz3Ad0XkG8aYfcB9xpijHfH5uog8Z4z5znLv\nqyNmDwDs2LHDPPbYY0vMyLnYxK0mh59+iubEGOHMNL4f4FcqVPrWUB3ox/EDdJLgl8qUenpXvH6W\nxCAKx13t72o5OVcGInJwtd9jNb9m3gfcLyIHgM9j3XmfBLo77jqAjcDRBea+B3jIGJMYY0aA7wM7\nAIwxRzt/jgBfxAoZwLCIrAPo/DmyGg+Vc+FpTk4gGOJWExPHGGNwgwLd69bjBgWSsEWWZcyMDTM+\ntJ80iZe9dhy2CKenCKcmSePlz8vJyVldVk2cjDEfNcZsNMZsAd4FfMsY817g28Avdoa9H/jSAtOH\ngNeLpQzcCzwnImURqQJ0zr8J2N2Z8w+d9RZbN+cyJKjW8IslCpUKQVc3ju9TGViLch1EBMfzQRva\njQbtxgzN0eV/L9HprIfZoLN00bE5OTkXjovhx/gI8HkR+T3gCeDPAUTkfmCHMeZjwB8Dn8EKjwCf\nMcbsEpGtwBdtXgUu8NfGmIc66/4B8Dci8gGsuL3zAj5TzipSqnXhbtuOX+0impkmKJfov3YbSdhC\npymFWhfh9BTtpo/j+YjjLHttv1hCZxmiFF5QWMWnyMnJWQlyNbfMyGNOlzY6TYlmpkGEJAypjxxH\nOS4D192AXyyeNr7daJAlCcWurjwxIidnFRGRnads5Tnv5BHgnEuWcHqKpB3h+j4Th4ZIwiaO59Fu\nzSwoTkGlchHuMicnZzXIxSnnkiQJQ9J2m7jVJG63qY8cI41CgnIFpZb/a2uMoeMGvmwwWqOzDOW6\nl9295+ScL3JxyrnkMMYQTk8Sh00cL2DoqUdpjY8jSlCuRxq3ScIQbwHraT7tZoMkbOF4PoVa17I/\n6LMkAcDxvHN+FgCdZcTNBqIUfrmy6H0YrWlNTWJ0huMHFGtd5+UecnIuN3JxyrnoGGOImw20zvBL\nFYzOEMfB8XyGX3yO1sQEaI1B8HybtGAW3B53MkkUAnYfk8kyZBn7mJIoot2wxUgK1S7cIDiHJ7PE\nrSZp3AZAuS5e4cyiarTG6AwA3RHJnJyrkTxqnHPRma3+kMUxcbNh3VlKMX38GNPHj4LRIOCVyqy5\n5RYKleqiH/Bxq0ljfHQuTVy53rIz+HR6QhCy9Pykls9PzlgqUWNWvEQ5+OXyeXn/nJzLkdxyyrno\nGAxJGKFcB69QRCmHqNlgYmgfYKxbrtrFda95I5Wupd1zcasFGJTjUOzqWZF7zuukliMsmHRxNvil\ncid+pDqllhYnqFQ5d3stJ+fyJhennItO0mqhfBcyjVss0m61GN+/D5SDUi5+pcK6W24nGh9Bh01q\na9YtaoE4vk8Wt1Gui1rClaezDBGZW88KWvey790YY5MXHOeMoiki+R6qnJwVkotTzjmTpSntmWlA\nKNS6UCvYBAugjSYNI9K4zfTYGMN7dqF1hlcsUx0cZP2tdxI3GjY21WqisxRHndkCKVRrNsa0iGAA\nJO2I9kwdEIpd3StOgJhN3NBpiuP5KxK1y43LMesx5/ImF6eccyaNQusKwyYUBCuMlQg2TtRuhxzd\n9SQmTUApitUeNtx2N36pDFrTbjbwiiWUu7iIiMiykh+yuVp6hixNVp6dZ8xcXCtbQT2/ywljDDNh\nhjaGou8QeHmYOufCkItTzjnjeF4nM07OOv3ar1QZ2fcixui5c9W1a1FKSKMWpd5+qoNrV2yVLYqB\nsF7H9TyCSo1opo7jejieR9SYQZSiUKme0YUoSuEViqTt9pJp7ZcraWbItM2MjFO9pDglqUZEcJ3c\nyso5N3Jxyjln3KBAqWPNnI14BOUqcdiiUCqTlKtkSZu+a7ex7sabSdsRICjHOb/CBGRpQrFWA6Dd\nmMEYTUKIcj1Mpwhs0vbwi6Uz33ulSlCpnvH65Y7jCI4SMm3w3cWFKYozwth+uagUHTwnt7Jyzp5c\nnHLOC+ciHGkSoxyXvutvpNDVQ6mnh4Gt29BZRrvVwvP98y5MAK7v2xT2LCON26RRhFKKUt+JHmrK\nWd5/EZ2mZEmCE/godf7v9WKhRKgW7fMsFXPKThi9aA1cOX8NOReBXJxyLioHdz7KzOgwld4eujdd\nw9rtNxOUbY286ePHaIwexxjovWYLld7+8/reQcdl1242IM0QJfilMn5QQJXLxGFI3GxgdAmvcOZs\nO6M1relJMAYVuWfV8PBSZrmJEAVfYTAIgu/mbr2ccyMXp5yLxt4f/Aujzz2LUUIcNujbcv2cMAGk\ncchs1Xzr3jv/iLIZfW6hgCQJblDA6exFyjpVHdqNmcXFCWNbx2OF6mrFUUKlkH+k5Jwf8t+knItC\n6/HHcZ9+hm5lqOuOWyxNmDw8RLm3H79UorZmPTq1e4jKvQNLL3oWuEGA0WWM1vil8lzygzEGUY6t\ncbdEkodSDkGlShrHi1auWC3SdkSWpnjF4hXlUsy5usnFKefCc+gQ3pM76cUwqSF2Fd1bt9E8dgRV\nLCEi+KUSQanMmm3bV/VW7HudnvouIpS6e8jS9DRx0mlKHLZwfH9uc61XKF4UYcrShGimPndfV/Je\nq5yri1VPpxERR0SeEJEvd46vFZFHRORFEfmCiJy2m1JEPBH5rIg8LSJ7ROSjnfObROTbnXPPiMiH\n5s35DyJyRESe7Py8ebWfLWcFhC2YmYFWCx75ESrTFBB6HIcNL3sl3UEJiSKy+vSCVR2M1sxvjKmz\njLTdZjWbZYqy5YZOjblEM3XSzgZefZ7q713qJJlmJkwJ29nFvpWcq4QLYTl9CNgD1DrHHwc+YYz5\nvIj8KfAB4E9OmfNOIDDG3CYiJeBZEfkc0Ab+rTHmcRGpAjtF5OvGmGc78z5hjPnDVX+inOVjDMRt\naDRhpg47H4N6HUcEx/Xw166DyUnCQhFq3WjXpeJ4MDYK1RoEAa3JCWZGR3ADn671m1BK0ZqaAGNw\n/YDChW4rMSdWMu/12ZO2I0QpHG/punun4rgeQbWGTtNVtdzCtibThjQzeK7g5mniOavMqv6GichG\n4C3ApzvHArweeLAz5LPA2xeYaoCyiLhAEYiBujHmmDHmcQBjzAxW9Das5jPknANpChPjMDFhraaH\nv21fZxm4HmzcBM0GTE4RTIxTcj26DKhGw+YlNxqk7TZTRw/Tbs4QhyFJFNqWEh2LSWcX3nIp1GqI\nKIyxlSXOhXazSTRTJ5yeIo3PrsqEFxRsE8ZVSLefZVaLRGx6eU7OarPaltMngd8BZncp9gFTxpjZ\nT5TDLCwuDwJvA44BJeC3jDET8weIyBbgLuCRead/XUR+BXgMa2FNnrqwiHwQ+CDA5s2bz+qhcpZJ\nGEK9Du02/PgRmO1PZAysXQOuY4Wq1bSbbMMIpiYhTcAPoFhAV6p4gU+atMEYJEmIogitNW4QLBgv\nWik6TYlWWBvQGI0ItGdmzqmo62zvplNfX2qUAgfPNThKUCoXp5zVZ9UsJxF5KzBijNk5//QCQxcK\nGrwcyID1wLXAvxWRrfPWrgB/B/ymMabeOf0nwHXAnVhR+y8L3Zcx5gFjzA5jzI6BgdXJAMsBGg0r\nNBPj8NijVqhmKVdg7VpQDvgFKJehfwCOHLLzssxaXWNjuKOjlKKE3o1bGBhYi54YR+rTKCUUKtUz\ntqDQaUo4PUVUn14yLpV0agPqLJ1rULgYIupEFXP33KwVv1TG8QO8QhH3Eq5cLiL4rsLJhSnnArGa\nltN9wP2dxIQCNub0SaBbRNyO9bQROLrA3PcADxljEmBERL4P7AD2iYiHFaa/Msb8/ewEY8zw7GsR\n+RTw5VV6rpylSFMrRo4Lu562MadZKhW47Q4rWgbo6bbCNDEBvm/decUieD7EbZtGXihCmsH4GJnr\nkgmoTquLMxGHrblirE4ULVr7zvF8kiiae70UIkKxuwednJ7Jt1KU4+St2HNyFmDVxMkY81FgNsvu\ntcCHjTHvFZG/BX4R+DzwfuBLC0wfAl4vIn+JdevdC3yyE7P6c2CPMea/zp8gIuuMMcc6h+8Adp//\np8pZkDCEnTutKF27BXr7QCl4/PGThalYhJ/4SZvsoBzwFGy7wbr5jIFG54N+2/UQBDA8Yq2oKISj\nR8Bx8Lu70UGAGhtH2gls3rxgUsJ815w4DjpNz9jbyQ0CSm7vafNmMVrb0kSed8JiUg4qWNxqMsbY\neZ3Ovjk5OcvnYuxz+gjweRH5PeAJrNggIvcDO4wxHwP+GPgMVmAE+IwxZpeI/CTwPuBpEXmys97/\nboz5J+A/icid2O/jB4Bfu4DPdHXzox/A0aPW6kkTuNmHPc9aN918Nm2G/fvB96BShg0bYGAApqag\nXILAh9FR2LcfBvph0yaYnob9+6zwAVKt4uzfD3EME5N2fun0wqy2+6yHMYbG2AhGawq1rjkrxRhD\nuzGDTlP8cuWM7kGjNa2pSYzOEOVQ6ulddjmfqD5NlsQrnpeTk3OBxMkY8zDwcOf1PmxM6dQx/wD8\nQ+d1A5tOfuqY77Fw3ApjzPvO2w3nLA9joD5txaXTzwmDPd6/7+Sx22+CsTH7ur/fWkzVqj13+BCI\nshbY9LS1mpTYWNS+l+x5L4DeXkA6QmVAZ7BI3ybX92lNTTB1ZIg0jqn2D1KsdZHGMVmazpVEilsN\nXP/0enhpHBNOTxFOTxJUKjhu55mXKTKzmXxGZxitkVXMpsvJudLIfQ05Z08UwvFh6O6xbrxSCXbs\nsEI1/4P4hhtgesp+rZCO6FQ6NfQadTt+Ztrug9Ia2hF4LhwcgiS18ae1a+34oQNWIGpdcO1WG6da\nBJ1m6EwjCEkcEzUaRPUp2o36XINEZ17zQmPMXH28LIkRAb9Ywhjb2uNU95zRem6dUwnKVZTj4hXL\nq5rmvVqE7Yx6mJKkV2+9wJyLR16+KOfscVwrFGFohSkM4Yt/Z4Wjr8+mjte6bDp5mto40+Zr4Lbb\nTlgf5QqMjcNMw1pMff0QeHDkiHXd+b51B5ZLsHcvmAwKRbtHqn+JKuVhSBEoV6pkSij39s+layul\n8IolHM+fc+llScLYgX2kYZPq2g2Ue3tJowivUKTQ1XWSiIHNCJytRh6Uq6clXXiFwqIFYy9l0swQ\nJVaUQqPxlujllJNzvsnFKefs8X3o7YFDPhw7OhcXYnzMitObfhqefMqmlAcF6O+De+6xY7S24lOp\nWqHq7bXze3tg6KB17xWLdp7jwFNPQiejDteFo4dhdASu2QI9PaffmzHQaOCATUHv7sLxA9sjSmuU\n49hCr/NcdI2xERojxzGdxg+V3j7KfWcWwCxN5jYDp8mV1Q1XKetZ1ca+zsm50OTilHNueD747glh\nmmViAn70Q3u+ULSW1S23nbg+U4c4sdZVuQzNpv0TALFWWZpaV+BTT9pPSFHQ12vjWocPQ6kMxdLJ\n4pQkVvQKBbvJN81QnovyAxDBcV1K3QuIGaBcD+U6ttirHyyZYef6AYkbYjKNVzhzt9zLEdtk0CXT\nJm+5nnNRyMUp5+zJMlt+6LnnTr9mjHXNVavWVXf3PdA1bz9PmlnryfNsPClJbY2co0ftn4OD1vra\n86wVMSV2P1RfP7zwgk2GUMrGpmZJEhgetsVlCwVYt+7EeywjiaHU08PAtu1kcUKlv39JcRKlKHVf\nWY0F56PyahA5F5FcnHLODmPsRtof/OD0a75vhSLruO4CHw7sh0IAjZZNI69UrOhUqzah4sgRGBkB\nMXDNtVZMnnvWlj5SYt1/9+yAJ5/oJFYoqHXbeNQsWp8okZSlVjxXEPNxXO+cuu0arYnDFqIUfvHK\nsqRyci40uTjlrBxjbFzoO9+BVvPka9WazeJzHPvT1QUHDlo3HljXXa0bNqy3yRFK2fjS8LDdI6WU\nFaYndloLCKyQ3XIrPL3LljfyA2td3XKLtYpm8X2o1ewmX88/cS2KrFgFhUVTz8+VdrMxl54uSp1T\nzb3LnSjO0Ma2bs8LxeacDbk45aycsTF49NHThSkIrAgpZV/fuN3GjY4cOTGm2bQ/YdNm2xWKcPCA\nPfYC6Om11SDC0FpCjmM36z71pE0xNwbWbYA777Suv/mI2NhWb589jiJrdU1M2vcsleym3SBYlb+W\n+ckVV/PHcZxqwthm+hkD5cLll0afc/HJxSln2cRRStrOcI+P4Y+PnT6g3SlVpLWND22/ybr+Ds6z\nnGap160l5LjW2vECm803Nmaz/ZRjrZz1G8D1rVgJVviuu+50YZrDnPxaY62m2ftKklUTp9k276LU\nJV3EdbWZL8y50ZRztuTilLMsZiYiju+f5sjjB+g6+jwbyoqeksZzsNaN1nNp1Xge3HXXiVTxn/lZ\n+NpXrSDNZ3TUik1jxsadnnnaznEca+FsusaWOjp02LoD4xjWr7PidCYKRatPxlgRMga6ukHN2NT0\nVUz3FqXOSwuPyx3PVZQLoLUh8PI89JyzIxennCUx2nBozwQv7T7C8FNjDBSr1COfW2SK/usGbaxn\neMQWefU8eMMbbTIEKRgNzzxvkxOKRSsWjmvjS0nS6V6nYGTYig/YsSJQ8OHIUYgj65Lbcq3NwBsf\nt8ci1rqaH0eade2FoU1nn5q0Maq1a1eUHJFzbvj5pt2ccyQXp6uIdpjSbiUUKz7eEhW1Z8lSzfiR\nBlMjDYaemAEcIu0wHQdMRQ61ZoQ/NW0z8TZugrvvPmFFZRn84IcwPnpiwU3X2ASHA/vBNTbrLihY\nwZlPENh2G1rbtYtFO2/4uI0pjY+f2N/U23tyuSSwwtdq2Q68hRhG1cmZfTk5OZc0uThdJehMUx8N\nMcYQhyn9G6tLTwJa9Zhmo80TXzvcOeMzFoKvUrrDKu6RkC4/pafL2MSFSsUWg63XrZtuvjABlIq2\nzl65ZC2mNetgcsKKURRZy2fDBpuVZ7QVOEOnOvnDdo1yp6K549isvSCwcav5FIs2hV05tiX8Mvo0\n5eTkXDrk4nS1MD+TbJlRap1pxg7W+ac/O7U1lk+sfVId0og8XNF0+T4qTTvVGYrwyCNw/PjJ03r7\nrHAlia2p19dv401R27riikV45X329XN77HwntYJ2+PCJdZpNuxE3TmHbNpuIMTJsN/Z2d9nKEZ4H\n69bbWFaSnCg0m5OTc1mQi9NVglJC95oScZQSlJb3zz5+pLGAMJ1g/1QZtzejUNDowbVWnFzXZugd\nOXz6hFbTVpSodVnrZ/8+ax0VCqCKdnPuIz+0e6Vuuw02bLSW08S4rRyhT6mO3WrYVPXnn4epCejt\ntxbVNfOSElY5CSInJ2d1yMXpKsILnGXHmow2/NOfP7bomEgXeHrMZaihGetW3LGhSGGmjnz5H08f\nXKmcSICIQtizxwqTcqyVs2ULPPGEFZt22+6N2vEyaz01G7bA60t7T17z+m2wb+8Jq6rVshUnwpat\nuZeTk3PZsuopNSLiiMgTIvLlzvG1IvKIiLwoIl8QkdOCASLiichnReRpEdkjIh+dd+1nROR5Edkr\nIr877/yS6+Ysn29/YReN4eWMdJmOXHY+nvJXnzrG3k9/hWR+eyOlrDXjeva161nxMR0rSGcQRrB7\ntz2vtY09GW1dfq5rN98ePGiTJ8Ce23q9TUU/eNBaV1mnVl+tBo3miTJGOTk5lyUXIt/zQ8Ceeccf\nBz5hjNkGTAIfWGDOO4HAGHMbcA/wayKyRUQcbAv3nwVuBt4tIjevYN2cJchSzTOP7GfPv4wvPXgO\nBTikYZvRVpF65DDngCuXoT5j081vvNFuzD21hFDYsgkQAIi1sjzfilaWWSsr66Sl+75tMlgsnN5t\n99bbTrjw8j4POTmXNav6P1hENgJvAT7dORbg9cCDnSGfBd6+wFQDlEXEBYpADNSx7d33GmP2GWNi\n4PPA21awbs4iZInmuUcP8vBn9p/FbEFEmGr7DDcDUoO1YqLIuvGiyLZvP3YUBgZt+nf/gN1sq3Wn\ngZCyx+WydeE98iP4xj9DOwY6+6HWb7AJEs/tOdGmQwRecS/cfAtUK9DdfXpqeU5OzmXFasecPgn8\nDjCbt9wHTBljZpv/HAY2LDDvQeBtwDGgBPyWMWZCRDYAh+aNOwy8YgXrIiIfBD4IsHlzvu9lluZ0\nm6MvTfDwZ89GmCypcRluFdCOw/bXbbUbc5PEik+raTP1RGzW3qtfY7P0nt1ts/OyzNbV23ItvPSi\nnXfq3ifPt5bTrl0nqlGAFaY77rSv81hTTs4VwaqJk4i8FRgxxuwUkdfOnl5gqFng3MuBDFgP9ADf\nFZFvLDJ/uetijHkAeABgx44dC4652mhOtzn+0hT//MCexQcKZ/hbnUXRyorc/o5tqGtqttnfZMFO\nPHbUDjEG2hHh0EHM7icoZQpTLCIbNsJdd9vKDpN9tur5qRgDLzx/8rn1G6yojY7a/U5pahMsqsvb\nx5WTk3NpspqW033A/SLyZqAA1LCWVLeIuB0rZyNwdIG57wEeMsYkwIiIfB/YgbWaNs0bNzt/bJnr\n5szDaMPEcIPp4YiHPvXMMiYsPeSen9vApjs22+SGO+6yQjE5aVtgHBwCx+HAWoeZ4z+kVIPuhgIy\nVNVFN46ie7sZLo/jDaZsOupQSjtvWiieXgUdYHoKvetJskoZd7qODK6BrdfZ9PT57TQWo9Gw5ZRK\nZWuZ5eTkXHRWTZyMMR8FPgrQsZw+bIx5r4j8LfCL2HjR+4EvLTB9CHi9iPwl1q13L1bYngW2ici1\nwBHgXcB7jDFGRL69jHWveozRGJMBLi8+epyjL07x7I+PLUt4luLGV/dz71tutAfz69gNDMCbfgbG\nxsimJ/iR/gbX7zMgkChN1l1AB2OUnRLfHX+YqCuiq+gw3l3j7srdFDZdz5Fnvkv1uQbFWFBAogzN\nkkapEJloUI/ByQxdL41RwqAGXrm8m04Sa62BFaneK7ezbU7O5cTF2Of0EeDzIvJ7wBPAnwOIyP3A\nDmPMx7AZeZ8BdmOdSZ8xxuzqjPt14GuAA/yFMeaZxdbNOYExGXE8YQUqC2hMtHlx1zGbbnKOlAaF\nN77n9kXHxF01dqsXyOouz61rMlBXJIGiu7+G71d4KdtHhG3WN+1n7O9p0b/G50DjYYYGXuTaTHHn\ngYBK6tHa0I2emaYtmqSgSB1NogyZm9JSdfplmdk+s80NjVnVRoQ5OTkrQ4y5esMuO3bsMI89tvhG\n0ysJrdvE8RRZFpImhq/86fNMDgFagGW6wBag0qt4/++/don31jx0/ItMxGMoUWCg4nTRU+hm0N9A\nj9/HPw9/iSadthoaqqqbjJQWNs1cEri22c2rN7+NuOwRjR6GyUlmugrUR/fjTk5T8qr4t97BQM91\nBGqZfZuyzMaqfD9vQJSTswxEZKcxZsdqvkf+VfEqwu5LFrSOGB3ez+C2w4ipMTNeJGlhG/OtUKSU\nD7/yH1+z6Jh2u81fH3uAmAhQBCZgY3ANNa+b3sIARVPgX0a/SkqGoDBofFUkpEU6z6wzHoQ9VQ54\no2xxr6ewfjvOBocBUaQb7qaeTNIyTcpOFU9W8ByzLeVzcnIuGXJxukowRpMkUxiTMT4+TqL30DWo\nUMrQPRMxdbzI1KEKOoFlC5SCn//te5gZD0ndhHKlSOCenlDwP4793/OONAkJvUE/vhMwMT3OE8kP\n5q56+KzzNzIej9Nk+pS3c0m9hOHoCAP+IL3BwNw113HpdQboZYCcnJzLn1ycrnCMMUTRMFNTz5Km\nE8AAaboTz4NKv6bY1aAxHhNUEtxCSmvCp3G8gg31uSycpW955c9fi84MQ8PHaDsN3ILiho3XUXCK\nZMbWMPqLof/ntHkazVQyQXfce5IwAbh4TMYTpwmT4FBTNTCQmISKc0qLjJycnCuKXJyucLIspNnc\nTxjuResWNhHS4nn2R7kxpVpMz8YWrSkhbTu0JsocfbabaLrMQqkFG+8osvW2NfZSmmKMQWtNpFuE\nWZPx9hjfmvjyGe5K0+et4fH6ycLkETDgr2Uofumk84LDgDuIxjDgr+GW6t347jLjSTk5OZcluThd\n4dhtXwatU0AvOMbvFF5IkpRCFbI4oVQzeMWU5x/OcAOfNPSYc/dV4G3/64lU7cwbZLh+nHJ3QEGV\nmYxHFhEmKFPh+ZndxLRPOn9L6S6ebP3otPFVVSUjZTBYxz2991FyyqeNyTl7jDHL7vGVk3OhyMXp\nCidJGihVgE6K9mLM37Na6m2jCjH3vq/B9LESh5/sY2akBij+tz98/Unz+rp76OvumTt+6PD3F32f\nJg2apnHSuW6nb2FhopuiU6Hm1ri96+W5MJ1HjDE0oow0MxQ8RXGZ7VRyci4EuThdYRiTkSTTGKNR\nqkCaTjN5CzTpAAAgAElEQVQx8fCK1vA80AqUa8gSTd+mFlkshK0ufu0UYTqV7x/4NiMcWdH7ufhM\nZRMLXmsxQ0mX2V69nW6/Z8ExOWeHNpBmditJnOpcnHIuKXJxuoIwJqXdHiOOJ1CqCEwzMvLDs1pL\nOfYHDEpl9Gyt86/fed+ic3YdeJLdrHzfWLrgLmCFQjBA1etlfWnTAmNyzgUl4DpCmhl8L28xknNp\nkYvTFYSNK8FoI6Me1lH6IGWZOqc1vU6oqW/gdlqtQxSL60nTBq5bxXVPVAAPw5Af8vVzeq9Ztqob\naagGM+k0ZbfCq/vfcF7WXQkmTdFhiCqXkSu0N5SIUC26ecwp55IkF6crCKUCkIBmUuLH+6cw7SG2\nD8Kair2+3Dqop+J5mxDx0DqmXn+amcZujNEM9L+Rcnkrxhj+3wN/AG4AjoA6t1+rQ3of670t3N1z\nL+uLG/HU2VevOBuM1rRf2odJElSpRLD12rlr2cwMutXC6elBXSFFYnNhyrkUycXpCkIbOFb3+NrT\ndf74ewe5vmeAn71xnG39KeuqKRVs+biVfBb19b0L37eZfo5TZWxsF63mPowxjPFt0tThuztfT3f1\nVhrBBtLiRsiSZSnhbOmsLM0wGJJRjVtVmAKETpMNhU24F1iYAJsa32nzbqJw7rSOY+KhITCgGw2C\n66678PeWk3OVkIvTFcCj+8eZChOqgcNzR6f54+8dBTz2TgrfP9DN86MpayoJtw2GbF8TUyosuSQA\nrvtz9PSsnzvOspAgWEuj+RxxNEKjsZe9L/0+LlBsHMBJQlpZi7Y/gKbbBq0WKAuUJAnRsQzHEbx+\nBRrqzySkR0DWZhQ3xGyp1HA2XvhfT6M18dAhsukp3P5+3MHBUwbM/nn11qTMybkQ5OJ0GZNpw899\n8lu8OBpxy7oS1YLPd/fNxpgU4LN71DDYytg37iPK4LqaTV0p5cDWfjizgdPFpk1bTjqjlE+xuJ7+\n/p9ievpRDhz4b0Bn95Np4EYxrm4QJnVmuBbjdZFoFxMD2iCugAutfSnhs5pgQJG2NemEJt6DbS85\nCY098CN28vqbfxr/ArvO4sOHaTz2GOETT+CWy3S/9z24PTZLUPk+/uZN6GYTJ2+tkZOzquTidJmS\npBm3/x8PEXa+wD95tAW0ThsXZwGH6xrIcIcMLxwXblpn2Nbf5tY17TNU0SvQ3X0TYXgI1y2idYTn\n9SHiUShswg/W8NRT7z9phgckxHjxJJK10colzZqkMkh03KM9olElhQQQPaEhhSjUtn1ke8GbIAzD\nCy5OGEM2MoJptTCBT7TraQrz3HdOrYZTs6WTjNagNZK32sjJOe/k/6suU77xzLE5YVoaBSieG/eA\nImNhSJgqFCkTLcNt6zT9pRNWVE/PXYgISTpJo/EMBg0IAogodj/zBwu+i03si4mzmEp0iHbSojVp\niI+uJTks4GS2SMXsfS/RR2rfvn3ceOONlEqlxQeeR/xNmyi9/OVkMzO4tSr+tVsWHGfimMYjj5BN\nThFsu57iTTddsHvMybkaWHVxEhEHeAw4Yox5a6eL7eeBXuBx4H3GmPiUOe8F/t28U7cDdwMvAd+d\nd34j8JfGmN8UkV8F/jPM7QD9b8aYT6/CI110kkzzG5976ixne+ydgsPTPg/uqnLv5jaNJOQnrplh\nrQfXX//bRNEIIoY4nqQtYLKULGvhuGX2vvh54Nii7+ADEo2gSZGJCvHhfsCzbrsV8NBDDxHHMffc\ncw/uWVonK02TFqUo3HgD+B5KhOKtt85dSycniQ8dQscJ7uAA2cQkAMmhQxS2b8+z3nJyziMXwnL6\nELAHmC0j/XHgE8aYz4vInwIfAP5k/gRjzF8BfwUgIrcBXzLGPNm5fOfsOBHZCfz9vKlfMMb8+qo8\nxSpjjGE6nSTVCRW3RsEpnnHsBz71A5JzejePyFhH3DMjwnTbIUqEj/3SvwagULBJAI5T6dxbiuMU\nOXrs+yTJzuW9RQZeOg3NBtAEuld8l+12myNHjrBjx8p7mpkkob1/P3pmBndgECmXMDMzOL29c265\nM9765BSObwvL6lYLp1azpX527WLyr/4aMzlJsG0bpVe8AqUEb/36OWEycYzJMlTxzP9+FxJjDGFs\nayoWfZULaM5lw6qKk4hsBN4C/Efgt8X+z3g98J7OkM8C/4FTxOkU3g18boG1twGDnGxJXbYkJibR\n1oAMs+aC4mSM4WNffIrvHJg+7drZ4TEZQT1y+IfffBfGGOpRSsFTBK6D65apVrdb4awfYGjo/1rR\n6kplOCrmXLrs3n333TgrbARojCFrNMim6yRHj5KOjiGFAm5/P+2hIZLDh3F6+6i+6idR5dNr9Tnd\nXehWC/E8VLGIThImv/hFph78O5JDh1ClErJvH866dZRu2k5w440A6DAk3r8fow3e2jW4/f1n/dzn\ng/pMg28+0zzp3PWDsKm3QK1WRV2hm4tzrgxW23L6JPA7QLVz3AdMGVsqG+AwsGGJNX4ZeNsC59+N\ntZTmR15+QUReDbwA/JYx5tCpk0Tkg8AHATZv3rzc51h1XPFQ4qBNhq8WzvU+OhXyyL7J8/zOHl//\n0CsREYbGW0yHCSJww5oqvms/vJrNfezc+aYVreq60GoWaLUGsI6+lXPvvfeyZcuWZY83xpAMDZHN\nNHC6arY+j4CqVdFhiG40mP7yVzBRhKpW8datJbjuejAap1qdW8ft7cXp6gJlLY2pv/97Zr76VbKJ\nCSSwzyK1Gn5fH7rZor33JZxqBVwXo+2vo26dnpxyLkRRxDeenMYBfvbeNUuO/6cfDS+YZ7J3BPaO\nREDET9xQZk1vZcH5sxaXMVAMFCq3uHIuMKsmTiLyVmDEGLNTRF47e3qBoWcM64vIK4CWMWb3Apff\nBbxv3vE/Ap8zxrRF5H/BWmWnVSk1xjwAPACwY8eOS2azihJFr9ePJsORhf9ZfFdx26Yu9o618LAh\nnHTBkcvnz375Zraus2nR7dQGhYyBOE1ph3tpNvfzzLO/sfwFjf2RDPrSrbT9HlpuQjuFlVhQr3zl\nK/npn/7p5b8v1pWXzdhq53pmhuJdd+GtXYtut2k99RTT3/gmJmwhWpPV68TDw6THjiPlEqpUwu3r\nx79mMyKCOI4VGN8nm5rC6elFxifw166l593vxh8coL33JXSzSfTsM7Rf3EvWbFC46SbKO3acV6tJ\na81Xn7TWcgJ88UfDvOMMApWmGc8OTZ0pAfIkfvBCkzfdEaCUIvDkJEsqTg3txLoDRaCUF4XNucCs\npuV0H3C/iLwZKGBjTp8EukXE7VhPG4Gji6zxLhZ26d0BuMaYuQCIMWZ83pBPYWNb50SUhWQmo+iU\nULL6LhARwVngn8QYA6mmr+TxntsHeFsc8ePpmMfrCU83YjIDv7ou4DtTKbtby886eP11Ve66bt3c\n8bruIsP1iJLvEDgRUzNHeObHfwCeDypevDHuKa2iNrGNknsN5W4XT2fsGnXQc5GypUVqpcIEWDdc\nqWTjRF1dKKXwN2xg6lvfYvS/fgKdJEilQunmm3HXraX5ve/jb9mCYPcwhZmtVKGqVYLbb8eJ2ojr\nUH7VqxClKL/8ZVTf+la8irU2/GuuIdqzh9ZTu2i/9BJOtUq462mc7h6cag11nrIMw/by/k2NMYzN\nJEyv4HfgwEiT6Uab4Rl7PFCGO6/rxvNO/B6q3GjKuQismjgZYz4KfBSgYzl92BjzXhH5W+AXsRl7\n7we+tNB8EVHAO4FXL3D5tDiUiKwzxsymkd2PTcI4a2IdM5Pab6uZSal5Kw/or5Rm2iDSLQJVpOKe\ncDOZqRZZGJPVQ67bO4GpFSi6iqrnEJiM37+pnxRhsBBSHmlyqJVyVNt/3JIDYQZFgRlzwkwtAHdf\ntxatMzAaRFEJXCoD9oM3S4UXvvsCMv1qqD2F8YZBh6A65Xxm9UWDg0dAgZSYmDZlutnkbAdXEZQU\nva7Hpio8O6p5aUZhv/8rYOFv4x/+8IfP6u9PRAi2XotJ07m9R+n0NFNf/P/QUQRZhhiDs24d8Ysv\ndHpGpBTvvgsQ2k/vovWjR8gmJ5FymcF/92EKm6/BqVap3HcfqlKZEyYTx6AUTlcXhW3X21hTkiAG\nsuFxWo/sIrjd4K7vwy2c216twF/catHGECcaEXAVFAs+hWab6BS/QAWY30XLVzA03Caa98VitAlf\n32U3cr/s+gqD3cU5925OzoXkYuxz+gjweRH5PeAJ4M8BROR+YIcx5mOdca8GDhtj9i2wxi8Bbz7l\n3G901kiBCeBXz9cNmzN7Hs8bxhhamf3oaKUNwlbIVyceJM1i7oxu5ubGVpgIKbiK1Ajby7Cl6PHL\na0s4nXjAmwZKbC8KfzcW8cOxiO01nz7fQSEgmsOh5rvTNunid99yE7evC2gNvwSJhy72YYIeHM/D\nGMOur38NM6lR9KEbN0ChBuwDJwDVgpYPqknBKbPG2USGRpNSZ4LruA0XF8/3KPqQ+NBVgIIDM23N\nZKJIjN07dWoL+A996ENUKgvHQZbLrDBFx44x8of/heTwYWvFGENw5x1Iu404LqpcwFu/nu63v51w\n927CJ54gnZiAJMFozcxDX6P4b/4NemYGk2ak00eJ9u4lPnQIVSxSuO463MFBvPXrKd17L+29e4l2\nP4vKXES5MBlh9BR6bQ1VWWbNKE5Pf3cdxU/d0cd4vUWaGTb0n/z30+w0DBSBYuCyZbDAYM3D9xyS\nzOA5QqZttl7gCi8dbzLZTFHKMFZfuDsywKN7G9y2rsGWjYM4Ki8Qm3NhEXMV1wjbsWOHeeyxM/cf\nCrPWnFvPkdX3uU/G49TTadpZix+M/wttWqChFpZ57dCdFPDth/4CFkemNRrwlGIkbPP98Sb9gU/N\nFVylOBTG1DP4kwPT/M277yLzHEiOUVWT9HiatlTIymtwK30cePxRxl+0hqfBkJJiaKGZgcJxAFTW\ng3LrbOyepMfvwsEjI8EloLZA2niSQKrhuQl4akQ41hDaRpj//egjH/kIxfOUgp2FISOf/CThYzsx\nIqjeXpzubuI9e5DeXgo33IDf10vtzW/GrdXQcUzj0UcZfeBTZAcPonp66PqFn6fvHe/ApCnZTIPo\n4EHq//gPxIeP4HZ3U3njGyneeCPepo1kYQjtNlIqIY0UMhdxPStKxiCeg1SLqGpHpAyYKMZoQAni\nKvR4g2yigfgu7pouVLd1C7YTTZRkZJmh4DunpYTXw5Ss0zSwWnRxHcEYGzOaCTMcBY4SqkUHpRQT\nMzHNKGOymXB4NGQpr+F927soFVwqhdX/LquN6Wz2zoXwUkZEdhpjVr7HYwXkFSIWoehcuMoEAEVV\noq1Cmu26FSYABXW3yfM9Q/QlNbraFQaT0zvCOkrNSdZgMeAdGwOSJEHbJdjgK4bDmK/ct5lSu4EU\nS5igTCAt0qiJ6WSaHX5+z5wwga0L4eEBXUAXSTTAbNZDj/RSibspSorruZ1xC+N51hN4Yw8MlAyj\ndUOrawtPHKhjvAJvectbVixMOorQYYhTqyHz0s3TRoPpB/+O9Nhx8DxEa7zrttL8yj/ZHk31Oly3\nFe26HPujP4JSicr11+Nv3kzP/feTTk3hFAIKN92MqlbtT6PB+Oc+R7T7GYwIGIOemqQ9NET0wgtk\n42OoYglv0yZKd96J6unDTIeQGUyrDZlBRzNzBWNNnEGaYeIUVfTRAlk9hDjDZJDNtMkKDsZxaUQp\naWYTEwoL7FUqBw5RrHEdwXXsNRGh4Dv4rpqznlQneFQreQSeg+cKgubg8fai++aiJMX3Vv/LWRRn\nhLFGKSuyeYbg1U0uTpcQrnJRuEQ6PPmCA6OlSYKWhxihL+nCYek4gDevqmu3B91FBYQw/Bxm+DmC\ndTcjG2/Drawn1T5ho83wk48uviZB51WCGI9Wu0rFjZbdK6pUsD8bt7+SuNzPxu0uTm39slx5URSh\ntbYilqbE+/aRhREmbBFs24a3Zg1ps8nYf//vtPfvBwPexo2UX/EKxh98ED0+bj/hKxWi0THib3wT\nRkfBcQh7eghuvpmuN7yB2j334JRLBFu22PTwNKXxL98hHj6OKpcxaUrx9tso3nkn6cgoOsswSYIm\ntKLVbOBv3ADlAjrTZEPjkOqTmxZqjck0xClaKYwjGDHgdKyoKCI80EZ7LjOFAPEclBLSTOO7JwuF\no4RyYWHxUEoITslosCLmUAoUgedQK3okmbVYXhxqcspvH7VSQHGJuNf5IOlYf1pDlhmUm4vT1Uwu\nTpcQjrgcmT7AU+EpAuFC02sz7TcZbHbZdPNliNPpGGAExTBCCxl7DMYeQwBXNnHowBQ2VUKxnIy6\nduoQOIooVXixZkU1WpWLP7iddW6ZLNMUCovHZKIoYmJiAoA0TakEAUYb0rExyFLS0TGaw8OMfuz/\nJJ2YwBsYINhyDdU3/RQzP/wh6e7dkGVQKND1S79Esm8fcb1OZ0EYHaX9/e8z8txzDPyrf0Xvr7wP\ntKb5zDOMfPw/0T52DDcIcDdtonjrrVTf+AaSoUM4fX24A/0klTJKKbyNG05qs6EchdnQA2GMCVwk\nM9bNV/DQky2oFdFximiDUy5iehVSLsJIHZ2mEKd4vovrOHi+w5nTJU/5+4ozkswQeOqMCQ0iQk/F\nw/eEsK1xlPCqOwuM11vs3Gcl6k139FAuXpieWr6ryHSGUicswPOJ1oZ2ap8zT/K49MnF6RJitDXC\nj8OFC160g4TJRp2G24timp64SmHOilkOCdBEOILMlR+cd7V1iO5yidFmSrasTbMebWNIjUsrDnBV\nG98/c3D9NNI2lHrx4xY0DkK7CNWN4Cz8K5ll2UmvVRDgrV2DbjbsnqR2xPjv/R7pgQOQZaS+T+8b\nfwrvmmtofPSjVpgAgoCet7+N1g9/RPjii2QHD0K7DRg7pjHDzDe/STo9jdvVRf1736P97LP2fXt7\n6X/b/XT9zM8w889ft00J05TiPXfj9g8A4K9bO9diA2xyg+O74C/wXNpuETAiSKrthmHfR5V8skqB\nIA1JXIcu3wFjcIxZ1oeq1idKFmU6W3JOOXDxHY3BCkS5UGPz4OIlnlaDwFME3rmJhjGGVjubSwDx\n5j17s20TRwBUcXUEMOf8kYvTJcJ0PMnXRr945gEK6l0hR9xRWlEbX3sUQjljQ7+TmY0otACbJnyq\nG85xoVZoU49c7DaZhKWsJwEMijhzacQxXcsO0Qlc/zrQGYzugfaUzQJ0S1AZXHBGqVQiTVO01lQ7\n1Rzc/n6c3l6i558nfHo3Ok7mWv0Wb7yR+OgRRv/ojyDriKYIhde8hujJpzBZStdrX4NJM+jpZvoL\nf4OemMBdvx6TpoQvvkj8zDPosTHrZyqV8Neto+vNb0aUQkpFkoNDiOfaFPLZ2nravpfWGjPWwCQZ\nqlZAVU+Pp0nJh1TjVHxoxhhtUOUAEcEdqOL0V6x4tTp1kZfpWROxe5NSbeYyOZfCu4wsiUwbmpH9\nslEuODjz3JZpZohTK0BhrC+r58o5mVycLjLHosM8Pf44I8lRWswsPljgcHmchrS4YV8KwXrwy/bD\nc5bTgj+zwmQ7+QnRgksHHlR0xtpqxGRomGwvrTQGj0ZiCDNhayEjWU539nUvg61vAL8EWWzFFUCn\n4J7ZYhMRurq6Tjtf/9a3mP7bv0V1d1O443baL+3D27Ae/9qtTH760xBGVgTLZVi3jsLGDRjHwRsc\nJD12HP/azahqld7P/AXJ5CTR009j2jH1H/wAPTMDQQBKUbjrLvo+8AGy0VG8tWsp3nqr3TRdrYIx\nuAPWcpqrDBFnmMR+gOpmvLA4iYDnWEddbeHrxlXgObZvVLC8/64iNgY13UznLIkrqcJDnGqyTpmo\ndqJPejZHCUqsUXqqZVQOHNqJxnFyq+lyIBeni8h0PMk3j/8jzZO2Ri6CByYx3LO7RTWO0XoUdAJZ\nirQnMW4JqmtAbP+mk5ViFFvK8MzJC66Cqp/STl2m2glmybiTkGoPVxKU45CRwWICFWyD7fdDGkJz\nxJ6rrgOvCIVu+7NMsiyzxVj/519CHOM0W1Re91rW/ft/z9gDDzD1la9A1AalQASnv59sdJSpz/wP\npopFSq97HdU777R7opSQTU6hECqvfCXp8DBRp0CsNJsEt95K9y/+Al6pRDo2jtvXh1OpWEHSGlUs\n4q05xeLzHXAUZBpVOvuYjYhYC2uFGJjLzpt1ZV0puI7QSZg8TWSUEqolF60Xvla8gkT6SicXp4tA\nK23wPw8vVoj9DGh4w/eEzbGPkRCFYOIZJG0hyoMsItMJeFUodXcKMWhwpoAnEBr4S3xOBq6m6MUo\nSmSJAWQJ754i1j6TLZ+eYpuSd0rcSbbC4AD0bYPB663/MJpnvQU1qCxdyPSkvwatGfnPf8jMww+j\nGw3wfVzHIXjZyxj+g4/T+M53MEkCQWArONx8E/FL+8jabbvhKklofeMbmIlxKq96FYWtW8lm6qhi\nkeT4cZKhIbxaje63vpXiDdsovvzl6PFxdLOFKgTguijPQ4KAdHwCJzk9EVsphaypWVedY11LJskw\nUQKOQoreaSnhxhjrwtM2YULOIX3bVYLvCqk2BJ7QTjJcJTjO5e/m8hxFrWj/7tQCtZWUyJxBnnP5\nkovTBaat2zx1eJk9kU5h7WHYGHeqYpsIL8nI6JS10wnoFJVl6LRpRaBQBlMHfohDa0mXm+dBHEMa\nuajMJ+toExGgQIxCaUWmMnDNXIEHbYSjzQqN/5+9Nw+S7LrOO3/33rflWvva+4oGCQIiiIUgxUWk\nRJkCl5Fpm5RojSRTwdEfM5Kt0eqxLIdHihDDDln2OEYe2gqNFkrkkLZJanEwxEWkuInE2gAJoAH2\nXtVde+X+lvvunT9uZnVVdXUtjW6i0awPgeisl+9lvtzed8853/lOx3DHeIIxLmAh2AP3/X0oDrll\nbu9iHFagswTSA28nog6wWnP5tz9E43OfwyQJhCHh5CSV972X2h//CZ0nn8R2yS88doyh/+WDqDCk\n+aW/pfHVr6Bf+I47FynJ5+fJTj1PfPIp8iQhn5nBv+M44cQkwfg4/t49lF73OvSlS9g0wxsaxBsb\nc+k2a7FxgiqVyGt17GS+ptcKulHPqtW7jTOXbzK5i6zWScLRxv0P2CR7UeTkUnvu573UzGh0NFLC\nWF/4ouowubF00twNYnwJ5kPp3LDc0ujc0FfyvysS9128NNglp+8yJIJK1Mc1Sj/XRgaHlhWNyNAf\nX7m4KK7UyTMswiaILMG2piEziOg8UrTZjsFFmgjalwexM/soWUsramOkJYgjci/DTyKkkGgvIfE6\nKALwNLmnMCqnWPbJTeI4SwJ3vBmCbv9S1oKkDiqCwoBL5+0QJklofeObxCdPIjwPmecUH3iA6rvf\nReebj5BeuABKISsVgqNH6P+ffxIzPYX1fMqve4jKG99A89HHaH7ta25I4OCQEzacPk38ne9Ao0F6\n7hz2ta8leve7iO6/H5skZItL5MvL6IUFvDEX5QkhUP195Ms1VLVyFTFtBOFJ13y78gatgxI9lQli\nPXG9CCS6J9Jw9ZoXQ05xmpNp14StpCD0v7vklGSGTppjLdTbmtDfHedxu2KXnL7L8GXAsZETfOXC\nX+/sQA1LFdCepTBtKOVX1yFWB0ZZWofUNYUqESDDdNOHzzKIF/poXjgOcYkKgqhhutdKF0IZkWNV\nTk5EUQ9ig5w4aKNEBkKSZx61IKa/T+N/3wdc/Ut3nOgh6boj6A6YCqid1WFMHNN56imyqSn88XGE\n7xMePcr4b/xL4pMnibUmmJwkU4rg2DEG3//jmFoN9uxBz86RnTsPQiCBwXe+k+DAAby9e0nPnEa3\nWsSnToGUWK3RU1O0vvJV0ueeo/JDP4RpNsgXF1GVMnpuDr9LUMHevdiJiW0RE4AoBOAbZ1e0QTpK\nSAk9u6MbmH6rRIqFzFAIXCNvO9ZkucVTgsiXJNqSG0shkHhbPK9Lo3Xl2C9BhtBTEoHAYrsOF7u4\nXbEtcupOsH0/cNha+6+FEPuBcWvtN27q2d2m0FnOQ/Lv8U3zt2haWx8ACAVzlRwtBJlRbPWzdJd+\nD1olsB038uIaGbTcuLaj9qV9EBcAgUQQdg8wGHJypJVY7eHjGmZNmuOnEQIJMkfnPu3+mP79r4Dq\nBMROto7RoAL3t1dgpwUBvbRE/PS3MM0Gqq9K5a1vQU7uQXqK+mc/B502wZHDoCTB4UPYZotLv/pr\neEND+EcOU77nHsTgHtf/ND8PgKqUCSfGWf7Yx0jPX0BOTGA6HYLBQaLjx1zENDNLdukS1hiy2Rls\nrlnfBLtdYlrZf4uoxZHW1pdck2m3ZLBgE42QIKIAoSRxmlNv65U5TMutzI1dsU6C3WhrDJZioMgC\ntWJr3EkNlcLm5xf5ckURtxWR3QyEvmR8ICBOM4TJSFNLGO4sNbyLlwe2Gzn937jSxluAfw00gP8K\n3H+TzutlD2stjYWYLM0p94eEqxRb5+cucLL2BHHWcnMMBHibBRIZBBmEAiYXLCW73YuC54TknQJp\nVCPINnb4tManM30Q3alwJUm4ap5P97/1yJFYXPMmRmLyEmM/+B6isXE3hiNPIW2CXwS/5GpMAveC\nbW8q4eavJZudJT17FlmIyJeX8CcmCO44weIf/AHtJx7HGxykeP8DRCfuoHjPPTS++jWajz+OabXI\n4pjs8iVIU4qvuc+5iA8OIgcH8YaGmP/Y/0fz85+HwEc2m1Tf9CaKr30QgSB+9hlUXz+qWKTzxJOY\nWp20Vie/9zUvYuj85ljtRm5TjclzTDOGRkquLFKDwOJULQKRa7SUyFQjQh9RMVCNaCU5cXdQYKYt\nuXFpyDjLiQL3GRjTVfQpgTEWa1nTL3QtCOGEFi8lPCWRNsdYS5ZleJ6H2uEiYRe3PrZLTg9aa+8V\nQjwOYK1dEkK8uCE1tzmyJCduORVXazldQ05KejSfAsZCIIeiRsdAx4M+7TQC3RG3Ho4u+iQcyGCg\nrKJ5kfwAACAASURBVMg6FlrpKlPWa5xDpom1Zro9S6O2SLVf0V/xmV+MyXJLtexTKfp0agEslyAO\nu2v27V1+1ZqKF0z+0usIS6tsiKTvIibhQdZ24gdrwebQmoek5mpSlYmrSCqp15n6p/8M02pRevAB\nSnffQ3D0KOHhwyx8/BPUP/MZrNbYdhvxpjcRnTjhZiyZHFkuI4pFzNIiamCQfHYOPT+P//rXufHr\nQP1znyd+/HFnHttu41WrqP5+TL1O5Yd+iOL3vx7aHWyWYtJvYlot1ED/hum47cDmBpNqTL0DsUb2\nF5HVCCEEJsmwS22n0+gvIJTEdjJsM8bWO5BoMAYjFSr0sGmMKATYHMi7XxRrXS8UTqknhSOSYiSJ\nU+sEBEUfKaUbf4GLQgqhchoNY19WvT9SSowxCLF2gu8ubh9sl5wyIYSim2wWQoxw1ezTXayG8qVb\nleYWL5TkmaG+0CHpZJz81CL5uQG408DeNpDB6QEYyWCohdkfUxFQ6tbHtYBBAcWCJJHg1wyqI7HG\nYskQOKJYHd1k2l20zi7O8vz8JbSK8S9bmp0MYQXDw0WmZzpEgSSttykmI0wEY4Sex/XUl71/sG8t\nMYFT5Ylumirod5GUX3RpvrThCCvruJxiYQCivhWSuvCBnyF79lmwlobW9P3ADxAePkznO9+h8+ij\nEASQJHh79lJ89atdZHD2HKpYovzAg2RHjmI9j+yFF5CFiMIdx9daI+kM6XsEdxxHjo4RjI+5QYTV\nPtpf+Sr4PiZNSJ9/gWTqIv7gEOGJEwSTk2teolPtaUg0RgKdDGEtor+A7A4ZtNZiWwm23sEstBBS\nOD+90ENEPjbOnAksYJZaV5SNsvsFEAI85YhRgiiG3fSgQHb9+oQvkeUIISXVoqAYuvs9JVacO3pR\n2XqTWCW2FzXdSgjDEM/znGR/VxBxW2K75PQfgP8OjAohfgs3yfZf3LSzug2glGRgokSeGYQUzJyr\n0V5O+cx/+VZ3jyI8FsIp41yF+jq4G4bqcMyRors29aTiAwIILJkULB2BJWsI21CaN4RNhdAalVlH\nUBkYa/j6med56vIUSW91vQozi2sFEmN+Ea+/nxH2EPnbGF3RW6r0SwZ++m5KI5Wr94n6wItcjUmu\n+qpZ46Kq3m2jHVFJD60FtU//Bdm5cy73JASm06Fz7jwLH/2ok2f7Pqq/H//ECQbe917yhQWWvvxl\n8H0nXCiVUYMDmDQjuu8+wn17MdbSfuIJpJLIwUHkyAiyUiHq68c/doxgdAShPDA52cwsaE127hx5\nvQ6dGGstMgydVZG36rWkGrPYxHZSRzCBh0RglFwhp+6EEZDySjVJypX55yLwQGUrRCcChU1z5HDF\nzX+KNSrynfzcggzUphdkIQT+DVT73YoQQuB5u3qu2xnb+nSttR8RQjwKvBW3lvufrLXbGoPejbge\nAaaste8QQhzCjWgfBB4DfsJam6475v3AL63adDdwr7X2CSHE3wATsOLs/zZr7awQIgT+CHgNsAC8\n11p7djvneLOglCRta77xF6d4+ouzG+0Bze5FZAnQEjqWQEuSg3UGhzN3XQd8pRwheKx8akkEWkFU\nyl1BPIegaQgXFY8++wKPTJ3b9rnOZucZjMcpyiIKb824jasw5lO4c5RwtEDhjiFUaZMM70Z9TELC\nwEFIR1wElTty0vNnqH/hqzS+dgoG+p0ha6WM1ZrZX/1V944dOUz5wQepvvUtyMFB2o88QnxxCpnn\niCjCHxwgb7eJH3+cvNNBDfSjnz+FSVJQCm9ggMKdJyAIKT34IKbdQQQBQkhUuYQsl7F5jgwCRBRh\nn34apMQbHkKVSmTT04RHjlx5LRZs5go4FoFItBuYJyymECBKgYt4Cj5CCUTkIbRBVCJk1wxWFgJs\n6DkO6/rxichHFH2UlK4u+SLQi8pupAJwF7u42diUnIQQg6v+nAX+bPV91trFbTzHzwPPAD2b4w8B\n/85a+1EhxH8CPgCssUuw1n4E+Ej3eV4FfMpa+8SqXd5vrV0/wvYDwJK19qgQ4n3d53nvNs7vpiJu\nZTz79Y2IaT18aPjQCGkuxxQzyVB1kXB9ECNYUxLKB6BVdWI8v9ssW8vbXOwp5bYJi2aq9QKD4QSR\n1+naEG1AUCH0ff9+yq8eB7H9C561luQ7y+SNlOBAFX+w4HwBgxJkbeyFb6KfewzRnEEFCUGxRPAD\nbyaXis5f/dWV13v2LOE/+SeEx45R+++fpP3UU9hOGxGGRCfuRIYRulZzBqx5jk0zbOZqNiIIIM8R\nYYgslZGlImIiwBsZQSiFiCJkEBDs37/yfOaNb0AvL6OnpxGevzZqAgg95ECEbWVIX4G00NYu7ZYb\nMBYrcDZGvkJdY1y76EZVZriMSDT43g2ppdgsv2IcWwy23djbjjWeEt+VIYO72MVG2CpyehTo+QTs\nx63vBdAPnAcObXawEGIv8DDwW8AvdCXpbwF+vLvLHwL/inXktA4/xipS3ATv7j4WwCeA/yiEEPYl\nnkO/ONtCJzs5QtBZDli+VOSQ2oL7V3GH8SAJ3IiEs2fmyUoglgTWbP/lx6ZJUy9i0AwGE5AX8Vf1\nI1V/+gThUIlgqLDjPH+20CE+vQw5mFjjP7Tnyp3NWaidR/kZ3kCFwokBwocO4AUh7el14z1yg84y\nlj7xCeJvP4OemcEbGsSf3EP5zW9GRRHWU6jBQfJWi2D/PmQUdVNrluDgAYKJCWSp5Ihqi9SQDAKC\n0VG8chmTJKjq2lESQghUXwlbNj0rcKyXuPfdc6k728mgq5R0UZbrHJOBQoRrFwBSSihcv9aoJxlf\nGWzYc5xwHbjbIqdLSzHLLY0nBPtHQsKNxn3sYhc3GZt+66y1hwC6Ec6nrbV/1f377cAPbuPxfxf4\nZaBXkBgClq21vSLIRWDPRgeuwntxxLMafyCEyHFy9t/sEtAe4EL3vLUQotZ9vvnVBwohPgh8EGD/\nqhXyzcC3vnqRL/3pqR0epfALCYcfnNmZMMEHZE7NLpDJOoQ716uktHmhdpL+aIR2Wmdf+cQVcnrP\nXqp3jGzrcfJ2RjrTRlV9gqFuNV5a8lrqIohotTAhgUuPIYRBFCqoyRHK992HGh7DtFoU1GtpfOQj\n0HT9YHJsjPTxx9GXZ8Dz8MbGCO+5h4GHHyYYH0MWCgjfp3T33Vuf6A5qFrJYxGpN8txz4HmEhw4h\nVkWWQkmXPvMksq+4RhaO6HJjpkEL15fkKydosa63SWyTAKy1GOMiMiGk048Yutpwi8lyaKdOGt5f\nhEBhkxSbOOWo9dWWz9VJ3HdHW0usLeGuLncXLwG2++u831r7s70/rLX/Qwjxf252gBDiHcCstfZR\nIcSbe5s32PWaS3shxINA21r79KrN77fWTgkhKjhy+glcrWlbj22t/TDwYYD77rvvpkVV7XrKt780\nhblai7AFBAffkBNUxtB6ns1cxFeQ55ClFE6fpJTGyCBEllKWTYeMkO1OTwVo2QVanQVmO+cYLkxS\nogp3hOy57+C2jrfa0Hp8hvR8A7AU7x8n2lNFKg9/vIjp5KjRK7lKc/kFbKONlBkm82mfjom/8t+w\nzRZq716UgLFf/hVaTz+FWVrC37sPGg2oVLAmp++d76Tytrchu30u1hgnWlj1pun5ebJLl8hrdbyR\nEYJ9e5FFR5o2z8nm5pClknuMKLpmOi27dIl0+hLC9/FGRtYOFYwzbKLdW12O1krOI9/97UkXyaS5\nq1HpHNuVQ4vEzarKJYgkx3YylLGYUDl7HiEQvkK3E+xcA3KLqEbIQuhUfwCecufQHaxo4hTVV0RG\nAbYn1c+3XrQMlj3m6hmhLymHu3WqXbw02C45zQsh/gXwJ7gL/j/GiQ42w+uBdwkhfgQ3+7uKi6T6\nhRBeN3raC0xv8hjvY11Kz1o71f23IYT4U+ABHDldBPYBF4UQHtAHbKcmdlPQXIpZWtie+8NqDN0N\nldEBtEmwSUiaXsL3DZ6XIoRZS1RZBlkCWYpcmqGwNA3KZ6I9TdFmJEXNM+0JrmkNsQk0GZfa5xg6\ntJ+BV+3HZgaxDZNNmxvSuTbZXBubW+Szi6jIx8Ta9eAMFwhGusSgjYumxDC5jWmcv0jn7CWaX/4y\nNJtQKBAdO0rlTW9m8OGHKd5/PzZNWf70p9Fz84THjlF+4P4rxKQ1yXdOY9IUgUUWCqjRUbLLl8lr\ndfTCAqJYRC8uwtISptMhfvY50nPnyGZmUMUiJo7xDx8mnJhAz8+TLy/jHzpEdMdx9MICptVCeGpl\nqGAPJtUunYZwAwylBCWcSawxrm9JCSeUEBYTJy76yS0iNxhrsO2sG/W41JwuBwgkphg4r73Qh1oH\nMgM6x3ZSZx4buChMBh4EEpsYhJJX/PkCr5vesxtP5F2H/nJAf3k3XNrFS4vtktOPAb+Bk5MDfKm7\n7Zqw1v4a8GsA3cjpF6217xdCfBwnRf8o8JPApzY6XgghgX8IvHHVNg/ot9bOCyF84B3AZ7t3f7r7\neF/rPv7nX6p6U2Mx5vKZOlkTujkXtvNW++Ow51U+mQbfL6B1AYiwtoExKUotYUyClOCTIRqLRJfP\nEi1fRsRt/KSJTGNUs4FuCl7R79MyEefjfmDjQvy1YNEUjg5QvmsCbyDa0nbnCgTZbIt8sQNKYNoZ\n6bk6IpCuV0cAwpK3MpL5Bsm5AFmskL3QJLnUWXF0wPcRaYrJNNn0NKUHH3CigSii+ra3kTzzDEhF\ntrBAurgEAtLTp7HNJsbzkTojOHSI5JFHAIFpt7CdDjZNSS9coPPUU4hCkezcWWRUIL98GVMuYeME\nrCWfnSF57hTWGOTJk+Svfz3+2Cj+xDgyKCCMwnZSRCEgW2ySn1+ETgq+gNxCKcQb7UMNVzAzDWyt\njckNsq8IWY5IMnpDiaynXK9Ublx6rlsnItbYwHMkZoFQQTFEpNqJLAo+ohi48RtSOkIKu5+zsVek\n6lIgyrsWPxtBa02apiildm2QbjFsV0q+iFPd3Qj8CvBRIcRvAo8Dvw8ghHgXcJ+19l9293sjcNFa\ne3rVsSHwmS4xKRwx/efufb8P/LEQ4gVcxPS+G3S+O0ankTr7OAuuU2mBcOQieXsc3RrDFYiaXIlo\nXDh04geufBwmV905QEW0DoBlkkSiVE4YNlFmgai1TOniM0S1WUid7Fxojcxh0IfES9lbrBPnIbOZ\nJCVgJ+vhO9/1VsqTE3h90bYFELqekC8moC1oiwGypQ42Naiyj819kvMNkrkOncdnu0IBgywqpBql\n8EoPWSiSnTuHLJWovPUtBENDVybMAjZJkZUq8alTJH/7JfTCInk3EtKzs27S7fgYquRk6MH4OLrZ\nQC/X3L8Li1CrIY4do3zsGHkQ4B85Qr606FJtQuD1D5B2G33xPISn3Fj40VGUV0QGoXMYjyx2to7I\nc2yqIXZRD5nB+B5y0AkvrO4KIZIM6SmsEE7p6EtQChV5mGaM9RXE3Ygo9KBSQFqLsRbpe3h7i5jB\nIugc6XvIytWfjc3cuYjAc1N0bwBs1ypISnlb9RclSbJSx/N9f9dt4hbCdo1fv8DG9Zu3bOd4a+3f\nAH/TvX0al4pbv8+ncdHP6mNeu26fFq6PaaPniHGR1kuOqOQzMF6iOgL1pSnGX/unmLSPrL6XxtTd\n6KyNVCVsVsXmHoiUO99VXqlTSCFXBtT1iEvr4e6/OZjLFNpn6Xvs8xQbs1f26tW33FRvIgEVP2Mw\natGIhunYKkmngTU54PzJrkVWD/zoezGBYXFxiqoapVCpXmPPdfCc47ZVLq2l05R8rgNSotIQlWjM\nhQzdyFykIAXkEn9gmLxhKT1wjJEP/gxoDZ5HevoMqlrBHx296qlMq4WNY2y7hWk23eDB7gXU1Bug\nc4hj4nqd9Px5TKvl0oVd2G98g8Yjj7hBVuUyLC87YhkaovKmN5LnOfnyMtG+fRTvu89NwO3vd9Js\n7cQPQghEOcK2ulN3ycE4dwcR+UhPYQbKkGqk53X7nSRioIgMfUdOuUFkBjVcwQrhLIu0cSPclcC2\nUmScYeYbsKTAGkS16KKp3KyZC7UysBCwOoXq9hcWmyFJEnTXdSSKotuGoJRSaK3d57jrNHFLYbvf\nsF9cdTsC3sOVS+Eu1qE6XKA8GPHQey3PnvltlJeTt4fwwibh8GOUhqapnX2A1swrgD72PxTjRceA\nQZT0VxFTD91CU5YBCm0GGPq7T1DtzF5T6uD7bveSn9IsjpB4g2g5CKG7WElrse06nTR1jZ5ZgrMT\nBTyPyTtOrKwiO43GtsnJ748oPTBJ5+l5RCjQNQ3zrr6SN2K8/gJCCbyCh6362I4hPDFAdGyQYM+9\nBMOlNUo3756rVXfeyDAIKL/h++lUK2TTl7BANj1FOjeP0Bqvr4r1fEyziapU0HOzmE7nqsdyQ45S\nWFhVQp2bo/Hlr6C0xj9wgHxqiujoUUw7Jb+wiOmkiHKErEbopaarK41V3biLWgsyg6oUUF3XDK8c\nYouj2ExjO5mTn3sSGbnP1WYGq42rY2njIp4AZDkEC3mau3pWkl2RiWcaIh+UXKsMhJWZULf6PAlr\n7UrkEobhSxK17Nog3brYblpv/ejWrwghvngTzue2gZSCzPs6hZJTTmVyAa985QLYd+BRiiMv4BWL\nqOA1WATWHMH3J6/1kCsozD1JeRNiAjehHR9yyhQCn5qcAASEITZLyZMOFCtQDbE6RbXq6LiFD4wd\nPsbeE3dRn5vBaE1U3r5FgRCC/jfvo/R9w6QXGyx/5qxLf1kgFZh6iuoPkeWAwdfvQfgSf6yEDBTW\nWtKzZ8mbLbyREfyxq6MlcA2rvUgqPHgQm6ZYrZHFInm7jWm1kKUSeb1O3myB1nSmf4jGn/857Rde\ngO+cdqQEUCjgGKBLUj0EASgF1uKNj7vUTyfBtFLQOWapiZCSfKGBsF1370YMnQyMmw5sF5qIKECN\n90HXAWL1BfAqUnFvoNumxIqgQY1Vsc2EPEkRSY7wFGKwiPS9lSZbK4WrPykJ5dCJJvwbd8ENw3DF\nZPVGRU1Zlq1EY1mWvSQ1n10bpFsX203rrXaKkLjU2vhNOaPbBFp3mJ7+5MrfV8nB/Rw/WgQWsXYf\nJvexDGDtxOYXlLTJgdN/tOlzd3VZzDDERX+CZzgGCMdYQoIfOr+7YhWsIU9jjB8SXm4R7DvE3/+V\n3yAsFBjed2BtQ+cOoAIPNM4nTnWX8oFEDYaowQLFuwa7TagK4bvHt0niyATIlxavSU7rIYLAuT8A\nqlhEdWXivX97U2wHH34Hw5UywR13ODKTLurI0xS9uMjMH/0x+tQpole/msLIMMHhw8gwpHDsmEv7\nhL6Tg1uL9ByBrAlT2qkTIugc22jjZthDvtx2QhBjoeBjfYWeXoJ2iuwrovb0IyPPzejqukmsVtUJ\nIZzdUcWJHdaM1uiNQTHWpQKVdJ/XDZaACyFuOHmsjpR2az27WI/tLhlWO0Vo4AzOLmgXG8BaS6cz\nxXaV7JavYe0d+N7byHONt8lwp+LSkzip1sbQOHI6xxDPcSff4jAdyoC/dhKGvypVGBawwMTbfpQj\nx46t2M2Lbn/N9UAWfdRgRPHuUdLnlxAFD1Hx8UsBarBAOFl121Y9vghDZLGAaXdQfX3oxUWyS5eQ\nhSLBwQPXRZI2TUkvTpFdvEi+tIQ/OUl47Bj+KoGFDzAxQflDv3318VqvuEiocgSHfUhzrBIohHNz\nqLchCqCTYuabzqC14DuHcl+5ScKpk5eL3CLInSQ81ZgkRxYDGK6s1BxtbpyRrLVYIZC+cs2z1jX7\nrnnPfOUIys3K2PH781LC8zwKBdfzdrvOY7LWcn6xTSvJGe+LGNzMh3IXa7BdcrqzKzhYQddodRcb\nII6n+cY3P7ijY6S4ByHqWNtPjn/VPBINtD2fSjJDr9yXcfXkJQucZpxHuIdz7CfbyjXU91ce5cKl\nS7z6/vspdiOOF4tgrIQqeBReMYRfDkBAHmu8SrChJ58QguDQoRVboeT558GCabcdYZVLK/vm9To2\nz9c0wm4IKRFSoPr7EZ7Cn5xEBNv/6q63N1JKQeHKhdQLPFgV0TDW5+pAWEyaI4XAxCl2seXqRANF\nrBBX5lcpXGRprxSJXCOtxbQThKfIOxkkmXstlQirc0TuBgtKpaAUrJCWNRYbpyCv1LRuZdyupNRD\nJ8upd9zvda6R7JLTDrBdcvoqcO+6bV/bYNsugEbjNLB9R3AAazOcdSHMUiGLSpSymHKekArFnIpI\noiKFweOQzjO0+AzF9hwmrxF2hZQpME+FCxzgDAe4thZvY2it0d10142ADBVyZC3RrbEu2gBCiBVb\nIVntw8zNIQIfWbjSp5UvL5NedJ57NsuuUvLlrRbxs88iw4jw6BGCgwdRAwPYNEVWKmtI7kZCCLEi\n3RYIZNR9H1MNg26RIApdYj48TN6MEb7CK0dryFoo2Y2GBCjp+qesI7B8qYXIctf0u9CE/hIi8qEU\ngKfIl1rQiF0UNVJxisAdQmtNkiQIIYg2cczYxdYIPYXvCTJtKW/x3d/FWmzlSj6O86wrCCFezRX9\nTxW4Mcvr2xCVyh07PsbyDdJM0g4fYLk6jLSWXErSVNKUHjpyF7epiTey2HeEQn2K0fpzFOsXGZx+\niqH2WWqqyGn/AKfZ0yWmnV+YRjeQbF8vppvTNNIGQ4UhhgtX0mi5yZFi62K9PzaKNzjg+oxWCwn0\nKqFofvXo+eSZZ8impp18faCfYM8e541njJOfp+lKjeq7AVEIsEmG8OQKCXkDJVRfccPJuiL0QAlk\nKUAYiyn42FrH1f+iwEVh2mCVcm4U1nORlKdcajHV0DaYUOEPV3csitBau5SiteR5viNy6smyb/eI\naLtQUnBstEKWG6Jdh/cdYSsq/2Hgp3A2Q7+zansD+Oc36Zxe9gjDQfr73sVy7dPX3McCMQIPD5eg\nm6LFM9T7fhYQGCGIvYhY+tBLQwmB9gdoBGUa/UepxXez78m/JPAXCWsXeDI4zpQ3waVoEvydr9KG\nhoaYnNxaLXgt5CZnOVmm4BXwpMdS4iLB+c78Cjktx8tMtabwhMfhvsNrXM83gtjAWFANDmIzNz3W\nG9nAjNbz3DgP7BXHBCA7f5686eyHwqNHt3Qkv1EQnkRsMNdqs5HvwlMrK0EF2FUpOiMlpuGiLuEp\n5zTe+7yLrvZl0xw73yKdayIHiqjRvg1aFDaG53nkXdLfCclkWUaSOAv+26kX6sVCSYGSu8S0U2zl\nSv6HwB8KId5jrf2v36Vzetmj3vgWYSQxtd4k2wEsAkUd51oXovkgUoWY/CIe38TgkZX+EQjtBvSt\nrHY3uHj7PmSQeCUWiscp1Z5EzAySjYWYWBEQk/obTKbdBIVCgQ984AMvKoVzsXmRZtZEIDjUd4hQ\nhiQmoeRdSaPV0hoA2mpaWYt+1b+j55hrz9HRHYaHhyn6Gwfv4dGjzsi1WMQfHlrZbroXTqvzNUKH\nlwNWRz9qoIQaKG0oRVfVIuQWa5rkrQSbaIw2CCuQk1fea2usI/ANoiop5YqSMcsylFLbIhrTlefn\neU6SJCi1+cTeXexiM2yV1vvH1to/AQ4KIX5h/f3W2t/Z4LDvaRhjiTNJagokjKM5gBU/iCTG2m8B\nKVp8P3ivwNplhKfQOiDzDyIZQ2qFCbb4Qacp4YXzSJ0SxznRJ0/S9BVNAnTZI6RJGoauT2ebK9+7\n776bKNqZ/9566FyzHC9jseyv7GdfdR9PzDzBdH2auc6c6ykRTjIdeiGlYGe1n3bWZrbjHDG00Rzu\nP7zhfqpcRh07dtV2v2vmKkslN+NpFUySIHz/uhSBLxU2uvALT6KGy+Tg5jcZumr3KwYvNsmwsSbH\nuZ57vrcmQkqShDzPieMY3/fxPG9bROP7Plpr8jxHKUWSJC/6O7WL711stRzqXT02kny9pEP8blUk\nmSHJJ2jZf0Cm3g62DCrEkGPMMRBFEPsBgxUj5PYgmQLhxeS2gEoLGA+uqWWwluDiBcpnXoDaMmOf\n+m8YPMIMDj57hmZ5Dqk1Zw8e5My+vQjPw1a2jqIeffRR7rnnHnzfZ2ho6LoiqMiLmG5OI4RgvjxP\nlmc8Pf80l9uX6Q/7Odh3kIN9B5ksTjJQ2EJltwE86YjNYvHlzutpqlq9alggQDY9jV5cQvg+4dEj\niJd5vUQIgTdSQQwW3XgNY5HDV74DNs2x1tJsNRDax49CisXiCvlIKVfSekIItuufLKUkiqKV/V/i\nOZ/XRI9Ae6S7i1sTW6X1/p/uzc9aa7+y+j4hxOtv2lm9jCGlILdFNBPO11VnV2hc9rlIZs2Fv4AP\nZJRQdM2kc66SkgNd+yJQi/ME584y9IXPEtaW6P28ysZQqNcxQPjMM4wvLpJEIc8fPEhjfPOeaa01\nMzMzjI2NobUmuA7BQGYyIi9CG835+nmGC8OkJl1JPympUEJRDK6k4xppw227RopuNQIVcKjvEIlO\nqIbb9PrbBvKG89uzWYZNEsQNktK/1JAIGCg5N/jVasDAI643yWyOMBK5TlQSBAFKKZRSxLHrIDHG\nbOtC3nP37hmp3mpYbZmktaZUujnKzV28eGx3efx/bXPb9zxCXyJFxnK7u8HzXY2o9/8WEYk0uDam\nbN0d3VWonL1EYeYy5SceIawtXVWRUrgqlZKSQhwjdE6Y585dews8+uijKKWu+6IyUZoAC88sPMMz\nC8/Q0i084VFP6/QH/bxhzxsYCAeYac3QztrMd+Y53zjPmfoZmmlz6ycACl6B/qgfKW5c+s0bHUF4\nClWtIAqFrQ94mcC2U2yssa1kTRQjQg9RDomqZTzPI4rWmsP2bi8tLVGr1dbYDG0Hvu+/ZF55u7h9\nsFXN6SHgdcDIuppTFdiNhzfAcjPm2xfirXdcBx9FRo4EPA2pxYVPvd+31gTnzzH8+c8Qzc1Rnpra\nVCge5Dm0W8SeB8ZQaTZpWAuragCjo6MsLy+vzLPZs2cPQ0ND113EDr2Q3DrF3mKyiK98pppT4xOt\n4gAAIABJREFUdHSHp+af4p7Re8hMhq98kjxZEy0leUJ5q4bhmwRvYGDrZt6XI3qEtEF2LYqilTrS\nRouRTqeD1m46b7vdpr9/Z8KVnZ+qO8mbLaDo9W710nq7uHWx1acT4OpNHrC6cFHHDfTbxTpML8Yk\n1+nX7ghKI41FaIEVGlnXGAH+8hyVp58kWFqm8vSThFm66WMFgJdmqPl5Lk9Osux5tCr95NKjYDQe\nMDs7i5SSUqnE0aNHecMb3vCif7BKKkIvZDleZileIjc5mcmYbc/ymTOfYbgwjKc8DlYP8oqhV6CN\nRgnFQOTIIcszcpsTebd+Id1a6xpsAQLvllOmiWKATXMnOV93bkKITVO3YRiilFr5ftzM2oyz++pg\njCEIgutKKe8EvZTlLm5tbFVz+iLwRSHE/2ut3ZnlwfcoBALJNWpGWyHL3AdiLZVmh76pGl5m8Oan\nSVQD5hYpP/4NCs1rp8B62cBmVEBj6Y9j7jl5kr986I184RX3Mdapc/jyBbdyDAMG05ihoSHe/va3\n3xBl1YPjDzLTmqHklbhr+C7iPOaL57/IufY5vnjhi+yp7OH4wHGGoiEutS5xqHqI0HOO1x3d4Wzt\nLAbDeHGcocLQ1k/4UiLRzmoI97kT3lorceGpK6PatwFrLXEcr9SLRkdHVwhjMxhjdiQ53+h4Y4xz\npU/Tm05Ou3h5YLvfpLYQ4t8Ar2TVvO/tDBsUQijgEWDKWvsOIcQh3Ij2QeAx4Cestem6Y94P/NKq\nTXfjrJJOAR8HjuBaiP7cWvur3WN+Cvg3wFT3mP9orf0v23x9NwwDlZBqOWa5ucOBOj2xQ0sz8fwC\nI1NtFK78FNucpDmH9+hXCVtXake2e78F6mEB32iKWcbZoVH+9tUPMNpY5hXfOUU7CPnwO9+LLhSo\nJVXmohJGeYy1atCp89NvfesNk/yGXsh94/dxuXWZQAWMlcY4OnCU07XTtHWbi42LHO0/SuRFPLf4\nHOfq59hb2csrh15JI2kQ65jAC2hn7esip8V4kaV4ib6wb40rxS4cjDEr1kS9MRg99BwhwDXUblcs\n0JOeZ1m2Qiy+7287kuzNUorjGM/zSJJkQwf0F0uCu3h5Ybuf8EeAjwHvAH4W+ElgbpvH/jzwDK5O\nBfAh4N9Zaz8qhPhPOHfz31t9gLX2I93nRAjxKuBT1tonhBBF4N9aa78ghAiAzwkh3m6t/R/dQz9m\nrf1ft3leNwXCgzSAWBlkfoWeBOBdq0zXJSZ/qc3IuTojUx08HPt6QDnrI2qVyMMhGOnH1i9Qy+pM\nj04gjOHk4eO0iyWOTV/g8IUzfOSHf5QnT7yCPfMzPHXgCF++617issvKNoplGsUyfe0mcRjxngP3\ncODAgRv2+jtZh0vNS4ReyInBEwwWBinIAt+e/zYXmhfwhMdyssxkaZJTi6dYTpaZac2wv7qfuc4c\n8/E8A+EAh6qHNnz8DWcgrbrvUusSAHE7ZiAcuLmd+aHnPl+BcyJ/GSDLshUCklKuiVJ6tkPXW4/p\nNd8KIcjz/CqhxbXQI8pe3anXzLsevaguyzKKxeKu4OI2x3a/gUPW2t8XQvz8qlTflsMGhRB7gYeB\n3wJ+Qbhv6luAH+/u8ofAv2IdOa3DjwF/BmCtbQNf6N5OhRCP4ayVbgkYa7nQiWkLg1agcrCy+38O\noc1dbbo33iiz7kZm8GLNnlM1+uZjBE4HsfLTC0Ls4FFozUJ7jjwscdJvoIsBF0bG+LMffjcHL19k\nsX+Az33f/XzxtU7lv9w/wLeO3wnq6o+5VuljCPjxV991Q9+DRtZgrDxGqlPaWRtrLUcHj/LrD/06\nv/PI73C+fp6vTX+NTtqhnbeZak5R8SvcO3YvUkomy5NU/epV0vLc5JytnyXO42um/IQQRCoizmNC\nGd5QRd9GEEK4abQ3EM5VPHPWRuGNr2NtNkOpJxaw1u7owh+G4YofX4/80jTFGLMivNgKvWhoMwl6\n77241Wp7u7g52C459UoZl4QQDwPTbI8Ufhf4Za6IKYaAZWttTzJwEWcsuxneC7x7/UYhRD/wTuDf\nr9r8HiHEG3Hpv39mrb2wwXEfBD4IsH///m28hO0jzg3tTJMJi45cyk0o50RkAJ1A0J0nJ3NLhsXP\nDFErozq9TP98fE0FXhAEpJOvIWvPsFAscGp/gb8rt3h+zwEIAp49fJxnDx/fYKrhNSAlHzw4ypON\nNg/23ziVnETSSlosJovMdeawWMaKY9w7ei/Hh47z+QufJzEJn73wWfaX96+s0L85/U1et/d1eNLb\nkHjqaZ3TtdMuckJcM+V3sHqQju5Q8Aovy4uYjTPI3CJGSLFm6OCNgO/7K8SjtSaOY6SUK2NSRG8S\n7w6wOgKTUq48Zq+nqBepFYvFaxJVjxgB0jSl3W7j+/4aouqRYM9eaRe3N7b7zf9NIUQf8L/j+puq\nwD/d7AAhxDuAWWvto0KIN/c2b7DrNdvIhRAPAm1r7dPrtnu4aOo/WGtPdzf/OfBn1tpECPGzuKjs\nqpqYtfbDwIcB7rvvvhvawq61ZjkztC2u2cgHu6oHFwWpcRGVtBZfWGSaU5ltMXmmvaWHeFCo0ChV\nOFOG5SGP5ye97ZPROvy9/hLn4wyv1rph5DTfnufnPvdzzLZnOdB3gLcfejuloISxhmbWZG95L570\niE2MRnO5c5n9lf0EXkBLt7jcuswD4w+sRE3WWlKTUk/qtLM2uc3J8mxT5wElFeXgpZGk3wgIser7\ncpPItUcQnU6HOI5XUqWFG9Dj5fs+Sik6nQ5CiJXxG+AIqFze/LPpiSLAkdRqclqfhtzF7Y1tkZO1\n9i+6N2vADwAIITYlJ+D1wLuEED+CE1FUcZFUvxDC60ZPe3FR2LXwPropvXX4MPC8tfZ3V53jwqr7\n/zOutvVdxXQnQwq7lm1Xc4cBcsil+98YQynpMHyutu3hFoGBYQ3fLNmVuUc7xfsLMCcV7dzQ7ynO\ndhJGfI/SDpRdPWQmQyK50LjAr3/p1znfOg/Ac8vPUZmq8PrJ1yOFpOgVGSmMcPfQ3Xx95usIBAPh\nAG/Y8wYCP6CjO3z+wud5YfkFHj70MEIIFpIFZluzjBZHSU3KntIeELCnvFWw/dLB6hyyHHwPcT2T\naSPfuTkI4dzGbyJ60Y2U8pp1nut93J6Yot1uk6bpmlThVvZBvfPZjY6+t/Ficga/gCObDWGt/TXg\n1wC6kdMvWmvfL4T4OK5H6qM4YcWnNjpeCCGBfwi8cd323wT6gJ9Zt33CWnup++e7cCKM7xraOqeT\namrp1fOFVrC6kJRZ8qKleqG2slrezjq5Akx2YKbqX/fK+mkC3lQKmAxCjpciajqnkxvuLG9/5WyM\n4S9P/yWNtEHBK/DY7GPMtGbW7HO+fp6FeIFvzX+Lk8MnqYQV3nTgTQgpmO3M0hf28drJ11IKS3z2\n7GfpC/o4WzvLJ09/kv6gn7uG76Kt284WSUUc6nMiiVu1B8pai211hadZjqjuPBIR4san8q6FQqGw\nIl64WRFJoVDA8zystQRBsKV9UC+C65nH7uJ7Fy/mV3C9OYdfAT7aJZnHgd8HEEK8C7jPWvsvu/u9\nEbi4Km3XE1j8H8CzwGPd3HhPMv5z3cfQwCJuDtVNh7EuUkrznEaes3lr7Cr4CpFpOmNF2p02fqK3\nNbd2FvjJB5R797N8ZfLqTvBkJyWbW+aV5SJIeEW5QHnVBXG+M89SvEQ1qDJWGrvqeG00v/fE7/H1\n6a+z1FliqbNEMSo6ayHdTyfvUBAFalkNbTRPJE8w3ZpmX3UfD008xPvufB8lr8TxgeP0RX1keUZ7\nss3HT32c8/XzDIQDGGs4WztLoAKEdTWmW5WU1qArdnk54Eal8rZ6jtXEtx0zWCHErlR8Fy+KnLb9\nE7TW/g3wN93bp4EHNtjn08Cn1x3z2nX7XOQapLg6UvtuoZ1mfHZ6nuXM4gN93s75OgsUOhQYhdOO\nb4Eny7BU6CUBDZu5SL1GwrcNJFzdFKwtnEszJuKUu8oFDkRX+kpm27NYLPPxPIOFwascwE/OneT0\n0mlmWjMsxotYXIe/QFBVVSyWzGb4xicXOVa6r8q52jlGC6O8Zd9bXD9LnvDYzGO0dZt21mZPeQ8S\nyZn6GR6aeAgrLAPRABdbF/E8DynkLd27JISAYgj6+hYN3wvYtQ/axXaxlbdeg41JSAC3j0PmdeCF\n2gKPzi7QtAWkUORArHe2ZLahT3MwYPCyB3lGxtaD1UdyeNOc5ptDHvVNPr1/NFTipw5MUlaSr9ca\n/OnUIqdbMRp4dSkgUx4TvmIw8JkIfbxVU1mLXpGWbhHKEE+sfRJrLZ2sQzNr0spaKBRGGDKbYbEs\n5osApKRgoaIrlAolltNlAhVwpn6GTzz/CUZLo1yoX2BfdR9SSEpeyUnFdcxgNMgdg3cwH8+TmpRA\nuZV3J+vc8t864Um4nlrT9xB2Yh9kjEFr7RYnuzWo7ylsZV+0s3Gq3yM4t3yBf/vIHzBbn2eoso/x\nygn2VV/VTUfsQD2nJJ2KT1zy0AWB6mxNbgdieHhaM960fOTwxh9fCPz43jEmo4Dx0OcrS00iT7K/\nGHBnOeSn944hgUhIhkKfkXDtOe+v7ifWMaEKr5IVP7f4HF+6+CWaaRMPjxYtPOsREJBwtfN5gwZD\ncojIj0jyhKXOEl+++GU85eoQiUm4a+guPOVx19Bd1NIaw4VhikGRhwYfAgtLyRLGGoaLt27UtIsb\nA631ihFxEAR0Op2V/qnd8RbfW9iNq68DX7r4Rb4y9UnAcK5ZYrx+gpLfTykYILAlhJD42yWpQHHu\n7gF0lnDkua0rVmUL9y7Bksph1VykXuNuASh7it94YYp/fnCcqcBnQMHhyOeylOwJQp5pJ9xVLrCv\nEG6o0JNCbjhfqRbX+MKFL7ioSbdo5S0sFo0mFCGBDVzEtAoREeWgzEhhhKVkaWWMu0YTeRGj4Shv\n2f8WLrcuM9Oc4XLzMi3VopbUODlzksiPeM3YazhQPbBh/40x3VEi8uXX07SLq9Fr3l0/P8oYQ7Pr\nKXktdwhr7Y4biHdx62KXnK4D00tT9PqSU1tnpn2Kp2c+w3jpOH2FCUpBP2FewROhI6rNepF8D8ho\nTlRoTi8iG3bL1N6ihL/YIyE3oCRFoKAEhyKfqSQnsZbLieZ3z81SDT2MhSPFkFpumcs0lSTDKxeI\n1M5+xLWkRi2pIYXkaP9RllvLLJklR1BWo7najl1JRX/Yz0A4wJMzT7KULyGR9Pl9IOBI3xHm43m+\nOv1Vnpp/ilPzp/A9n78+99eU/BJKKmbbs7z76LuvEmdkiaa53EFIQbm/gLdb53nZoycjF0IgpVxp\nvM3zfCWKStOUKIpWnNN7iON4pZa1Ox7+5Y9dctoCjbRBM23SH/VT8FzB4/jIcTh9ZZ/ELPPMwmc5\nvfR1jgw9xGj5KHFapxKNMlY6RpF+BJsP8Ss3c/x0g7RePwSHh0kfm1/ZdLkEl4rusSTwv+0boqwk\nVT/gD6cXOBOnlJWg4Cl6Y6Gm4xSM5VyWEinFUpajriFFj3XMQrxAySvRH7k5PlPNKb5x+Rsc6T9C\nO2vzquFX8e3Fb9PpdNy02w1Kkx4e2mpqWQ0pJbW8hsWSk7OcLTNWGuNr01/jHYffQTtvo42mbdrI\nTJLoxDkB5JL5zjxpnmKsoZ21Cb0QX/q06jHNWgwWlKeoDNziBamXMfI8XyGMm4kwDFdMY3tOED0z\nWHARVI+EtNZUq9WV7T0nip0MRryVsNxOMRYGits3zb2dsUtOmyAzGRcaF7BYGlmD4wPHAXj70bfz\nob/7EC1aK/umpkVqWpyc+Qzl+W8glWKyeieL7fMMFg4wVXuKUjjMePkYk9VXrkn7Rb5PWpArbUs5\nVzR4/kgfIll74b8QChYiBUpyZyAZLETcXSlSVBLlKZqZZiwIyKzlkXoLX0JZSs7HGbmEft9jcZMf\n8MXGRRKTsJwsU/ALSCH58sUv88LyC3jC451H3snHnv0YraSFh4cVTp23EaSSHK4eRilFQRRoWpfW\nCwjQuaaiKgQyoN/rR1pJWZWxWMYr42g0cRrzyMwj1JIaP3bix/A9H094HO0/eqXNS8CN0m+b3GCM\n3Y3CViHLshVD1+165V0veuaz6xEEAcVicaVHCtYaxPbICrjp8vibgaVWysWlDgA6N4xWdyO/XXLa\nJuSqifYzrZk1xLQahg71vAM5JMtNSv4QWfoFUtFkINpLRy8RemVGygdR0uPhQZ95Qs4cs9TPNPAu\nZ3jakZMYi7AdjVUC+hUs5/jAjwQ+8cExRkLJkUqJZW2oZTkegonAJ/M97q0Wqfoe7xofwFrLQqr5\nVqPNuTjj/2fvTYMsS886v9973rPfPfesrKy9el+lbi1II4SQEAKFwMGqwTAQGDAO2xg8hoGYIfjA\nrEwM2J4wY5hxjBwwiFF4MGJA1gxiBAgJoW6pu9WLuru6utbMyn25y9nexR/OzazMqqwluyursrvv\nLyKj8uY959xzb917nvs87//5Py1Pck/l2m9+x3HAlDOK2lmbhWSBp+ef5qvzXyXTGV84/wVmu7Ob\na01D7hCzxey2Y4yLcSrVCu+ZfA9Hho4w257lwZEHeWrhKQyGqldlPVsnLVL+/hf+PkdbR7nYvkhG\nRqGK0lfN8TYVey/aF/ncuc/x7ce+HWUV2mrieoTRZS9yVLl6xMJuMdrQa5ff0L3AJYhunamr0QZr\nQb4BlXwbGcnGSI070RwrhCCOY6IootvtbprKbpDn+aY0/bpl9H2KMpe/XOmb6AV7KzAITtfBczyO\n1I/QKTo0g7K8tZ6s86lnP3VT+ydqhUStXL7dWaXdW6AqW2DW+eaJEf635z9PHBzi7uF30nl8nPBS\nRmshQSQGco1xBf5IhDsakry0huMKRt42xi/cfRBlDF9Y7ZAbw2JhCKXgi6udzYzi/cNlyUMIwUjg\n8c1B46bOe7o2zVq2RqYy/mrmrzi3fo6XV15mKVkiNzmLXC4xCiGYK7Y7Qwy5QzQqDY40jtDWbZ5f\nep7VdJXZZLYMfAJWi1WEIygoWMgWGE6GyXSGg4N0JLEb40ufalBlOVkux7j7VSIZUfEqpbxcQnOs\ngrW3RhBhtlwgjLp1dj5aGZJOGfT80MMP31gfuw1nh2uNdL+dbPjzXTk6xXEctNabs6HeaAxXfJQx\nGAtjtUHWBIPgdENiL95Urimt+KUv/BJ/PnPDaSHXwNI2l/jKxf/AUucQ/+75p6h4wwyF01zqnuf9\nBz7Io5PHcRYT0ifnEaGLrAe4IzGOI3CrIbIVEJ4cAsARghHPxUGwXCiWMoWylkA4vJJkvP81nqXn\neIxEI5xePU2qUnKTE7kRgQxITbpt24pbKRVWW9p8HekwWZkEUR7Lczy01VTcCm3ZJjEJkYywWBzh\ncKhyiMcmHiM3Oe2sTezHxG7M4fphfOnz6VOfpuk0qTsNDtcPb5vRJIQAa+mtl+WeIPZec0lOug6u\nLzHa4t/irGmn398oOI6z70plVwagjcZeKa8eSQ/QyxUzKwmuFBweruy7AOY4gsnG/nqN7zSD4LQL\nukWXZxefvfGGNyBhmefWymbVXrGCdFzW0gZt9TzO8DH8dZci9rCJxq24uK0A21XImo83GiOj8r/N\nEYJjcciY0riiLMOtas2K0jxSe/1v9NF4lIO1g0RexLHaMZ5bfo5nF57l7NpZlFUcqB7gQO0Ac505\nTq2f2hRFtLM259bPcax5jEPVQ4xVxnhw5EF86fPc4nMkKqHm17indQ8nGic4Onx5sKA2mifmnuBC\n+wIKxaHKIe5xH6CWDtFcbJAmOZXK9uemlC6H0GWaNClojlReU/lMCEEY33qPOdeXqEJjTVkuHHDr\nuZHl0cxKwjMX1nhprs2RkZjveuQg0RtkQORblcEnZRc0wgauubUvmcWwns0xFj7KcNQiMSlRq0Z4\ntIFJFNH9I7jNgGKmC1LgDm1P+WPpEG+RhH98chhlLP5NyMTbStPThiHPxduhLFbzazw28RjGGiI3\n4rvsd3G+fZ7TK6ep+bXSSkjA0wtP84lnP8G59XNYLGPxGL7ro9G8sPICr6y/QsNr8N7p9/L+Q+9n\nLpkrjVzrR3GEw6XuJcbj8VKhJRzOrJ1hJVvB9ix1v85RTlBgGJUTdBaTq4KTdCVFlrA81y6/OVto\nTVSvOzG3yDRg8fZgoN+VCCGIqq9/TWy/oZTCGPOGcG+wwPOzayx3ckLP4dRCmwenmnf6tAZch0Fw\n2gXWWg43DjO3NHfjjXfBT9z/EzSjJg8MP8BIZQQ3knitABG65cA5IDh2c+tFjhD48sYX21QbTifl\nOkhHa47HO9e5A3n5ouoIh8P1wxyubx/rfqRxhMfHH+dLF77E2c5Zzq2fwxEO1aBKrnLWsjVW0hVq\nCzUeGXuEsXiMQAbMJ/NlybSAyI1oBA2EEEzXpkFAJCNqfg13SCNXI2I/IqpffZF3HIHrSzzfBQF5\nvl2JaK2lu57SW89wfUlcC1C5RitD2s2JqsFmRqNyDYKBWu8GGGM2lXNa631X9ruSOJCMVgM6qcaV\nDrUdMthCGVKlqW2ZbmyMHTR43yEGwWkXaKtLf7dbyE/c9xMMxUO87+D7OFTvT+Z1uKlZQEl7HaM1\ncb2B2OU3160rH+Z1ioNSlbKYLBIEAe+sv5OPHv8oUki01Xx98es8cekJQjekGTSp+lWqbpXVbJVO\n3uFC5wIVWSlnNfV5bPwxRuNRal6NieoEdtjSXUsQjkO8Q3ACiCo+WTVAKU1z5PKagrWW9krC2mIH\nbLk+VQQSgSBPFdbC2lKPsOJTqQWbooiw4g8C1A3YkHTfjNP4ncZzHL7pxCgnx1KGqj710GOlm1MJ\nXDwp6OWaP39xgUwZjo7GRJ7kwmrCUOwzPRQz8ibMfPc7g+C0C1zHxfduzZpElSofOvYhvuee72Es\nHsOTu1uATzsd1hfmAUg6bQQgHAfp+wgE1aEhHOfaF9dYOkyHPj1tGH2d84NSlWKF3fxdOpLjzeMA\n3DV0F99+5NtZSpew1hJ5Ea2gRUd16OQdvjb3tdJLz4t5fPJxAhnguz4nWyc3j18UGtEfjNddS5Gu\nJKxcblTU2pD2CqxjyZKc2VeXGRqrUhupsL7UY3W+TdLNcaUgiGuElQCrbV9Fl4O1WG1IkwK//43a\nvN6I/SbHcZxNEcKdVvDdDKO1ACEgyRXtTPGVMyu4rsB3HCYaIYHrkPUVmqfmO0y3YtZ7ClcIVrr5\nIDjdAQbBaZf86nt/lY/84Ude8/516vzYIz/G8eZx3jP1Hnz3+sEu7XZoLy1ijaU5PoEjJWeffZqL\nL72AMYa4Wmd14RJ5r4vnBYwdO87I1DS6KGhNHrjusYc8l6FbcF2peBV6RQ8Hh6nqVFmW23q/X6Hi\nXzbttNYSyYhu0cVay2KyyN9c+huUVVS9KkcbRxmOhje3L1KFtZa0m+OHHtYqVO5sluJ0YVhf6rE2\n3ybp5IRVnyVjCSoeeZrTXklQmUa6gqEpgSMEMpR4vkS6Dt01W2ZltQCjDJ3VhDwpqA/HuLdp8N8b\nDa01RVG8IaTbG7LzeuQR+S5JNyMtNEKBDD3SwjBeC2nFHutZwV0jNQoNvisIPEmrMhgNfyfY80+e\nEEICTwAXrbUfFUIcpZyCOwR8Ffhha21+xT4/BPwvW/70EPA2a+1TQoi3A/+W0uP0T4CfsdZaIcQQ\n8PvAEeAM8P3W2hVuMQebB/n8932ez576LH985o95ZuWZm943IuKX3/vLfPDoB7fJoa9Hd2WF3uoq\n7eVFFs+fQXo+L37x85x79uuoLEMGAboo0KogqtborC4DgqIoiBsNgnjvnZzbRZt6UKce1Kn4lc0R\nF9dCCMGRxhEiN6JX9GgXbVphi+V0mSLViPQ8ldEaYf+iIBzQucH1JBZDnujN0cGe7+J6DkWmwBHg\nCIy2pVpPCIpcU2QFIBDSIe8peuspXugiXUm1GRHVgs3z6qymIARKGXrrOfWRQXDaiTzP0Vpvyrd3\nO5vpdpm0nlvqsZYUtCoeB1sxIzUfTwrG6iHKlI3RtdClFrm8/56xzf3SQnPvgRqe4wzWnO4Qt+OT\n9zOUI9Pr/dv/FPh1a+0nhRD/Cvhx4De37mCt/V3gdwGEEA8Cf2itfap/928CPwn8NWVw+nbgM8Df\nAz5nrf0nQoi/17/9C7fqSWz4zcVuzHA8zOPTj/Pw1MP8/J/9POd6566774PNB5muT/MjD/4I94/c\nv6vH9aOIPEs59/WnybOUsFJl/txZsjQBY9Ddy7ZBeZphCoUjJXm3y6VXXmZk+jCVZus1PeebJZQh\nAoHFErtXu5nvhCMcJquTfOzkx5jrzrGarZIUCZGuELsxqtAYbciSApUbQFBpBBhtcJxS8FCkCs93\ncaTD8FQDb9ElrufEtYjGWAWjygtgbahCbz3FcSVaGdZXEmrNEEdqHHlZHKGVRhUaVRhcz8H1nc2M\nzRpLEPtvSIeHvWCj6fW1+O1ZWw6nNMbgeR5BsDclM6UNa0n5+VjpFkw1LZONiMlGtHlfJXAJd1hb\n3OlvA24vexqc+mPVvxP4h8DPiTL//wDwt/ubfAL4Fa4ITlfwceD3+sebBOrW2i/1b//fwHdTBqfv\ngs2+009QTt69ZcHpYuciqU5Lvzm3VJEtpUv8g3f9A37tK7/GS+2Xtm3fFE3uHbuXY0PH+KkHf4pW\n9NoCRG14hNlTL5H1OiSdDtroUo1mDKJvI+M4EqMNleFh7vuWD+IgyHrlOIu1hfk9D06xF3OieQJt\nNb7j8/zS8/SKHne17qIe1K+770g0sjndVhvNWruNZ3ykK0l7BVkvJ08KpOfiSEFcD1GFKdeQPAc6\nKVEloNoIqTZKxaExhqXZdVRWBrhKPcD1HKQnKQpN5DmowuBLhyJVZUDKNdZa/MDF9yV+5OOHLkWm\n0P21iLSXI4RAF5og9vDD/b/Wslf4vo+UctOcdTdsjMSAy9ZIe4ErHWqhSztVNKLtZqoS9C7FAAAg\nAElEQVRnl3v0Mo0QcPdEDW+XDv0D9p69zpx+A/h5YGNo4TCwaq3d0PpeAKZ22nELP0AZeOhve2HL\nfVv3H7fWzgJYa2eFEGPcQqQoA4Gg7MWZqEwwHA4jW5J/Xv/nvLT4Ev/oS/8IJRQfO/IxfvThH0VZ\nRStoEbgBq2lporpVmn2zuK5Ha2oaMTfL4pnTOFrhV6oMTUxSaQ1jjcH1fe55999ieHoaL6ow843n\n0aos963MXqTIMmrDI0S16weL18pGKe/06mm+OPNFXOFS6IJ3T737po8hHclQo4k1FuGIUvzgSdR6\nBqIULziOQxB7aKVJOhmL59cIqj5jUw2Cfhkw6ynSTo6l9FlrjMX01jJUoSmEJqj4BJFPEHll4Ck0\n0nP6Nkjl8TcyJGdLpqQLgyoUqjBobXCk85ZV9N2o6fV6bDiN3w4xxZGRCkob3CuCT9F36rAWtLG8\nRf8b9zV7FpyEEB8F5q21Twoh3r/x5x02vaYsSgjxTqBnrd2wZdjV/tc45k9SlgU5dOjQTe93sHaQ\n1Wx10/MN2FTYHW0c5Uj9CB8+/uEd9z27fpZO0cHB4UTrBJ5z/Q+kLgpmT73E+sIc8+fOsr64AEbj\nV6pIP8BYS+D5PPitH8IPKxhjqLaGkJ7LwplXiepNGmPl7CPX98mTUv7eXV3Zs+C0QTtvo41GoVD2\ntY0u2Oztij2KTFMdisqyWz84rM53aK/1KNLy+FoZZgrN6FSDsOLjSIHorze5rgUraIxUyNIC13Xw\nI39zHUG6DqJ/3fJ2ED9I6RDXQ6yxZYa1Vj7mYB3itbPhbn67uDIwAUy3Yi6tJ6wniourCVPNaFDK\n22fsZeb0HuBjQojvAELKNaffAJpCCLefPR0EZq5zjB+kX9Lrc6G/zwZb958TQkz2s6ZJYH6nA1pr\nfwv4LYDHHnvspgOb67ib5aeduJ5iqdBl3dtgUEbdMDgtXTzP0sXznP7aEyycO4Prunh+wPiJUl4t\npcvkyXtojB1g9dIsWdpj9PBRVi5eYHnmIvnpU4wePkp9ZIyxI8dYungeo9SO4og86QHgRzEqz0EI\n3Cu+zRbGlvZIN1BlXUhz1kWLA/W7iRzFI6OPXHf7G+F6EteThLFH2isoMkV7pcf6Ug+tNL1OhjCg\nAo3jCXrtjCLXxLWASiPETYq+V57BGIdqI6LIS0GEIx2iql+amd5Akef0hRaOFDgyRuV689wG3Fm2\nTs3djWqwErjUQo9eZuhlmsVOxsHWza2XDrg97Flwstb+IvCLAP3M6e9aa39ICPEp4HspFXt/B/jD\nnfYXQjjA9wHv23LMWSFEWwjxLuDLwI8A/3v/7k/3j/dPrnfcO8GB6gEWkgUqXmVzYOH1cH2fZH2V\ntblL5N0ONgyRnoc2lqzTxWBYX5xn/tVXufjic/RWV1ifm2f44EHmz55m7tXTPP+FP2fs0FEe+tC3\nceLt7yonhF4RdJL2+mavlB/H5L0yUDUnDhDE5Qf1fJqzXChix+FEHFzzAlAYy1KhqIcjBG6Fh2s1\nnOsE4dwYlgpFRUrqV4yKt9aiCoN0S6WUcARR1cdYU05XFFBkGs+VfdVXebvXzqh7DirXSFcSVR1U\nUa5pyP6359K2qDRg1crsKsBsBLIbBbMBt4cNYcWGVDyO410FqKj/f59rzcyqopMpDg9VBp57+4Q7\n8Sn7BeCTQohfBb4G/BsAIcTHgMestb/c3+59wAVr7ekr9v9pLkvJP9P/gTIo/XshxI8D5ygD274g\n9mIOe4dvvGGf5sQkwvUo8gz6qqjx43dz9qkn6a4uI32P3voqRhWknTbWGPJel6BWZ21+nvbKEijF\nrNaMvnyEk49/E+4Vi9ZaFbSXF9FKIV2XtNPebNot0mQzOK0VZRmrZwyX8oKuNjRdl5EtF+hUG3Kj\n0cairUWIgL9c7vJ8L+PhWsQ3tepXefedTXJ6xgCKeyohQf/8tDL01lMQ5dpEXL8cED3fxQ88GqMx\ncT0k6aQUmcb3ZLlmZAydtfJv0nOIayHVZlgqyvrByfUkuTYIR2wGLGNKRR52oMh7I6GU2rRQ2nCp\n2HivXDnCfSdqocfJ8SqX1lLaqaJQluVezpS/v62Y3ircluBkrf08pXqOfrB5xw7bfJoy+9m6z7t2\n2O4J4IEd/r4EfOstOuXbTtrpoIqcuNFACAdHSvwgpEh7OI7klSe+RLq+BoDOc6rDIyxdmqG7utJ3\nhvBojk/ghiH0J4Jm66vY/oKKtZaVSzNk3S7V5hDd1eXym2d7naHJKaJGg9XZWaTvUUQVFrOcubTg\nYpqTaEPDc1nJCpq+S1cbWp5ECsFz7YS/Xm1zLs05HvrUfUlFSH5/bgWL4EyScbwScTDc3vu0tZ66\n4X6jdTn3KO3lSNfpN9yyOZ/KD1wawxW00gSxjxDQWU1I2lkpNxeQpYpeO8WPfILIIK/IyvzQxfUl\nYkuZUuW6VOUVGmOg2hzM09nvKKXodDpkWYZSilqttjkJV0pJnuc35fcXepLRWkAnUyhtKJQmLfRg\n/WkfMKhP7APyNGFt/hIAKs/RqsDzA6pDwxhrMFrTnbk84E8GIc2JKV7+8hdQaYYbBMTNFnm3x/T9\nDzL7jec2tz31lS/xzR//EdbmLnH6iS+TdtpUhkYIKxXqo+NUW0PUx8dZmbmIcATnC83XFzv80fwy\ns3mBQHA4dHmgXmUs8DgmYNz3NucCX0xz2tqQaMu6tjjaMhQ6Gz2yeELg71BpORz5LOWKinQI+xmM\n7VsG+aGLMeVMpSuFB+Wgvstv21orptaKyTNFbzXFkaJcE5ICz5P02hlZkhNWAqK+mu/KY0rX6fvs\nWZxCbaoFB+xfNtaaXNfF8zwcx0FKuZlB7WZabyVwuWeixktzbZa7BTNrKY8cbBIPxpvcUQav/n5g\ni3Hm+sIlXvjiX+L5IYcfeBBVKL7+X/7zts0PPfgIvbU10rUyk1JZytypl3Fdj9HpQ6Wlgi2lsu2l\nRWZffrGcd5TnZGlCkKVUD07jhyHV4RGwpUIQYLmXMC8iVrUhM7Yc6aElXW2IhENHG4Yt5NYSCMGR\nOGAxL1DGcjjyORYHRFLy41OjvNhLOBkGjAVlUFD9dalIOtRdyYErsinXk3iBW2ZNkYfgsvXMjfAD\nF3+8SpYU5H0hhAwk65fa5WPnCWG8vddFa017KcENJGHVxxpblvoGcWnf43keURSR5zlBEBCGZfl2\no39qt71Xriy/UM2sJRTKcirocP+BBnLwJeWOMQhO+wA/iqmNjJJ22vzZJ36bxfNnwVqm7r6fqfse\nKNeQ+sg4ptZscfa5py8fQDgYa0g7HdbmLjH90MOcf+ZpsAbPD/iLf/cJvuN/+J8ZOXiIpNNiZPow\njbHxbc25lWaLtNvhrvEG68rhrtjnfAKRK3mwEnAg9DDCMuZ5WAFtZZCuYNz3+JbhGr4jGfIuK6Za\nrsPz3YQnOwmBK3m8WeV8mjOfF0gB91XjbXOoNgj6E2jTbo4qyrHbUc2/6YXuIPI2j2GMwXGc8t++\nvHwryzNtkl7pnDU82cCLJK67/73iBrApgIjj7Qq712OHdGgo5tJqSiuS5MqQK000EL/cMQav/D4h\nrjdIup1Nabe1hu7aMs989j9it3TRV+pNlmZnSNbXKb/iW7xKlSJNGTl8hNWZi4xNH6HWGub0U0+S\nrq5w4flV/q+f/Snu/1sfQDguc6++wuTxk1gEYSXm4L0PkBU554MqM1awpHI6uaajFO9rVvnQeJOX\ne+Xsp+VCcdgLCB3Bs52E59sJHaM5GHo8WI2Zjsom45msYFVpukrzVLvL480qi0XB+bQMBkeicMfg\ntMGGK4Mxpiyz3cSMqitxHIfmWIUiUzu6OdgtzuMCu+lIPuCtSS30eOfxYV6+1CYpNKcXu5wYqxK4\ng/WnO8Hg07hP0FrRXVpk6MBBsrSHK12sgfbqMvRLFW4YI4xmfX6231hbXlxHDx6iOTbOkYffRu/Q\nUeZOv0zcaLA6N8/Maul9q9KUl5/4Mo2RMSqtFt/40l8ycewEHc9HFTnpyCRfThTPyJjnOz1OpQUC\n+NczixhH4DmSqutwoh5yKPCYzXJWioICy3xWkBtL6DgcCH2kEByJfDJtOJ9m+ALms4KmK1l15aYy\n73p4QWkdJF1nU2n3WrheP1Jzsk57oYsMJHF9IIIYANXApR572B5cWkuJXMmxseqdPq23JIPgtE8o\nnceXOPa2x7nvfd+C1oY//e3/o1xzkS5BHFMdHsERDsZa2v3+JOG6CGFxw4Av/8G/J641GZ6eYvzY\nSR7+lm9l5oXLrulpt01jZLSUTFeqqELhuC7SdTHdNo51cdxymu4G2sCZbsrhSsR0GNL0XF5Nc7S1\nKGM5GHjkxuALQW4sXaVxHYd1bTgUuqwULqvK8nIv4dF6FYPAE4LWDdRQfuj2xQ/989CGvFcgHEFw\nxdrRa8X3JcNTe+uYMeCNx2gt4OxSl0Jb2pliPS2ov4V9FO8Ug+C0T7DWUiQ98iwlqMQ886f/H2mn\nDdbiRRFHHn2M+tAw0vN5+k8/s7mfFwQ0J6ZYn5tj4dwZ0l6HqFrj8IOPUh+b4MDJe5l5+RvgCKJa\ng/GTd1Nptli7NENvbYXq0Am0UtRsxruqPvePt7gwVOU/zi1zLik4GHq8rVlFWXi0FjPkuXyjV7pR\nT0UB91UjjvZSnmonWGBVadq6QFmLwaHmSjxHIEUZlO66TiPv9ShShdYGNMjcQboORaZxpNic67Qb\njDYUuUa6V/vjqbx0J3f967tA5KkiTwuk6xBWbn5dbMD+ph56nByrsdorRUJy8P96RxgEpzuIVqoc\n1tb/aU4coMhzuqtrzLz4PNaWE2AnT9zDibe9gzzp8uxf/Bd6KxtjqgSPfvhjjB4+xFOf/QxFWnro\nZb0eZ57+KmGlQlCtM3H3vSSry1RbQ1iVk3U7OI5EC401BoGDGwSMxzH3tyokONxTq2Ct5eudHpEj\nGfNdJvrquqNRwLrStLzy7TPkuRyPy7Umh8tju++KQxbygrms4FwvI5ISXwhOxiHuLlVQjhRQbPzu\nkPWKMlj1b++2cTbtFqWCMYO4FmyWDq21pH2RhCo0lUZ4bVeM/LK3XzlDanARe7NwoO+1Jx1BJ1Ok\nhWZ4MA33tjIITneI7uoKneUlVJ6XajJXsnRpBsfCK1/9CloVIARhVOXhD3+EuFLj1OlTzJx6kY21\nptroCN/0fR9n+eJ5HvnwR3jqs59hfWkeIRwEFoMlWV+jdWCSottl/uJ5Ln7jeUDQnJzi3ve+j4nj\nJ/DCEC+IiOt1pOsRGkOiDMtK8fZqzJDv0XAl55KMWEoiR/BCJ8FgebxepeW7GEqp+IjvUleaZ9oJ\n2lqsgPHQ45lOwpE4BOmQGEPtJoctbuCHXinz3nB22BoHXktM2Lb/9gMIITYl7NfLhlxPUmTlFwzn\nNQg2BuxfpCMYrQVcWOkxs5Liuw7SETTjwVTc28UgON0BjNYsz1zAKMVCXzZ+6ZWXSTptiqJg7tRL\nWGtxfZ/j73wn9dYwFsvcq6dwpYsyliCM+L5f+Wc4jkOlNYTRmg/82E/i+gHGGJ7+0z9h/vQrFEVG\nniYopSja7f4ZWFZnLzD36itM3X0fE8dKQ1ndz3jmc0ViDPOZYqavrku05sFqTCvwyLVhVZUKwld6\nKY/5VYa8y2+lwkLNkyhjyLRlRSk8yrWy2HGo7ELgUOSKItW4Xtn7tEEY+xS5KrOm1yCYCGNvs6y3\ntSlXiNLHTyuDvKKkZ4wl65ajOMK4lKx7gbvNbWLAm4uZtZSLqwmZ0uRac/d4nUY8WH+6HQyC0x2g\nvbSAMYaVSzMsX7yI63uszF5EeD5zL71QbuS4BGFEvVXaFOWdDnG1Rm1oGK0UH/hv/jtaI6MARNUa\nUbW27THe9V/9AE/80R/QXlpg/tWX6a4uX3Ue5194lrjZZPLkXczjci7NWM4V2lh6RlNYy1quOJPm\nuI6gqy0fHK5TkYK20pxNMlJlOBIHjPiXP7AV6SABHId312P+01IbheF8lvF4s7pNcHE9jLV0uzmu\nEOSZwQ3czUAiHIEfehhjb7pRdyuOdAiinYOaI3dWCKpMbZYSi1wTRIMR3m92Kp4k8h1eXeqgtGW9\nV/Cek6MDc9jbwCA43REEYVxBOJKhqYOknTatqWnOP//s5U2M5tijj+NXKnz5//kkuiioDo9y97vf\ny93vfi/DU4forCzjSElcb1z1CK7nEVYrtBfnKbJy6OCVFL2Es888hcpzVgQs5YpXk4xRz6UiJYcC\nj1OkfK3do7AWx8JsVjARuCijiYTDfKF4ar3He1s1AkdwLs051c0IHcG91RBpwdImtZZLqSLVhspN\n9I0UxvJyLyXLC4aFw3DoXVl9I88UeVKU2U4t2PMMxnEdyPq/75PJqVqZMsseeMHtCVOtiF6uiX3J\nUiejkynedqQ1CE63gUFw2iOMKUeEXzmmAqA2MoL0PIYmp8jTBGvGGT1ylFe/+jeb2zTGD3Ds7Y/z\n5f/3U6zMzuA4Aq0KvvXHf4qRg4dZX5zvN+KWY9rDatmLYa1l7vQp8jTl4L0PEsU1Lp16GUcIRBjh\nRxWKpEuRJvQcl6eOPcSTX3iSH37H2wkdB98RKGuZCnweqldoeJKXkpQ1pfAcWCwKXu6ldJUmsRaM\nZjaRpA1DR1ueWOvwapJT7dvBvL0eMxV6LChNRTrM5wVOXpBqQ2osd1Ujoh0u9IkxFNZCKGkbmIo8\nsGVTrtYW6TroXG8+5zwpUIXuj9coA1XazdHKEMTeLRlz4XqSuBZg4TWVEm81utAk3bLs6gfutrLn\ngFtDM/Z57EiLpFB89ewKoecws5IyVBmII/aaQXDaA9Jeh+WLF5DSpTo0vM0mCMpgUm0NEVaqpN0O\nQRzzhU/+bulYICWu5/Gu7/l+4nqjHKTmSozWhPU6jaHRqx7PbvH4Xp27xPJMOcnej2IuvPg8Qa2K\nNgbXd7E4FGkXx3V59eh9nD1wlLb1OHVqlt+67yCxjMmMZch3Wc5yfm92kTO9nLuqIWO+i7GQGsuh\nOCTRpQMEUvBSNyUQgoVcsZQrapGPJyCUDh8YaZAaQ09pVpRmKS/4m9UeUsCT7R4fHmkwdYXPXlU6\n+ELwUlJQVYZKqqhoQZ4W+JFLVAnKMetJOS59w1PNGotWpdR9w2WiSPUtm8G0XzImKNfAdvr9TlIU\nBUqpTUPW3ZLn+eb+vr8/xAdCCO4/0GA9VbiOc1UGP2BvGASnW0za7TB/5jSd5SUqrSG8MLwqOG3g\n+j5Vf4iZV17iG1/6c6w1+FHEXY+9i6hS4dzXn6YxMoZWCtf3+Y6f/ln8vpdYdWi4HJUh3W3rTboo\nyHpdPD/giT/+g3ItC4e41aI6NEJnaYEwruKGIUceeRvPxC3WvJAkyfjHr17ip48cIHTKAPRvZxZ5\ncr10oogcwQ9MDvNSL6Oa5KwrxYk4YLXQIATaGP7zaoe5vGDU83hPo8axainDPh4FPN3ukVtYLTTr\nSrNcFChKVdRioWi4kuqWcp8jBKO+RxpoMlOwnCo8BdqU49o3BgVu9DgVuSLrlSU+6UoE5bqUNRbX\n2z8B5Vbi+hKtS3unrQ3LdwprLVlW1j211riuu6syq7WWPC8zwTzP8bxb02x9KxiuBjx8sEk7VYzX\nB1nT7eDOv6PfZKgswwtCXN/HKEXcaF5z26zXY3Vuhs/+5v9K0l5HIGhOjPLwhz/K8sULeHHM4Yce\nZere+5g8cc9m6Q7K7Ks2dHlsfNrp0F5aYPHCObTWPPuf/pilmQsIY/HiGCEcktVlXM9n9Mhx3vHd\n38vkibu4+Mwr/PlKlwz4/EoXI2b5wclRjCdpFwopBAYY8XyGfI/DxrKcK3whORQGnIhLabgD5BYC\nR+I4ggcalw05hRDEriR2Zbnm5PicrIRcSHJOxgEO4O8gLKi7Dr7roF2HYeHgafC0RfqibHrdso/n\nu5vrLirX5JkuA5Mv37TlLiEE4T6SNgshLhvtOrs30BVCIKVEa73rseu3g+mhwRj328meBychhASe\nAC5aaz8qhDhKOaJ9CPgq8MPW2nyH/R4C/k+gDhjgccAD/nLLZgeB37HW/k9CiB8Ffg242L/vX1pr\n//XePKvLWGvprq5gtKbaGiKqN8jTlCCOqY+O77jmtEFnZYnl2Rm6K0tYSqPSo488TqXZokhTsAY3\nCIjq00gpyXrdzWCnigLbdx3Pkx5r85doLy7QW1nmlaeeYP7VV8BahOvRbA0hXVmWSuIKD33wI0zd\ndQ8A//ieQ/zMC+f4q9UuxsIXVzoc8Hw+MtHioXqFwHFoeJL3D1dRxiKF4FQvRVmQIuFj40MAZFrT\ndCWJMRwOfbpaIxGE0iFwBJFTBrHp0Kewlne7EtMwTIU+Dc/F38Fvz3cc7qtG2GqE0+892hg+qApN\nbz1FenLThVwIgdGGLCnIkgKjLVHVR3l6IBi4TURRtBlcXgthGG4GtwFvbW5H5vQzwAuUQQbgnwK/\nbq39pBDiXwE/Dvzm1h2EEC7wO5SB62khxDBQWGtT4JEt2z0J/Ictu/6+tfa/37uncjVJe53uSl+m\nbS310TGGDkzd1L5eEHDhG88BAoGgPjrGO777e7h06iWE4zB88BCtqYN0lhbRRbFZHsyTHiuzM6Sd\nNkGlSlCJsdZSaQ0xd+4sS2fPbD5GVGswMn2E9vwlsl6P4+/4Jo498vb+6VqWlOE7R5ucSjLmcoWx\n8FfrHWq+5LFGlUOxz1qhOZXkuI5Dqz+i3bLdgy+Qkve0qqwXBo3hybUejoBHahFV1+VkHKAtyL4q\nULgw6oc7iiGstZztT+A9EHg0+j1UZVMs/degHA5oMoXny821oI3G2a3Kvf32DfzNjBAC133tl5WN\n7GnAgD0NTkKIg8B3Av8Q+DlRXiU+APzt/iafAH6FK4IT8G3AM9bap2FzBPuVxz4JjLE9k7rtOFuc\nDsQuv+3VhkdBGcJqDceVNCcm+dpn/oj1+TniZoujjz6G5wcMHTi4bb8iy0i7HdYWF1h97utUGg0a\nYxO0xicQwhLWahitMViGDh6ks7iAX4mpj41x4OQ9pN0Oq27As+0ep5OUIc/jl46M8y/PL7CgNK5w\nOJvkHAxzXATWAa1gXWnGfJcjoc+raUbLk6Xpa/95h1KCEJzuFizkGYF0GHJd7quVaw+ugHNJzsu9\npMzIXLljcOpow1q/yXcuV5vBadvrLgValRNrt5b3SrWejxdKMOWU291aG90MRpvysQeB762LykCl\nENSvchkZ8PrZ68zpN4CfBzZW7IeBVWut6t++AOyUZtwFWCHEZ4FR4JPW2n92xTYfp8yUtsqUvkcI\n8T7gJeBnrbXnrzywEOIngZ8EOHTo0Gt7VlsIq1Us41itiXboN7oeQgiOP/4O5s6c6lsIhSxdPE8Q\nxSSddborS7zyxJepDLVoTUwhhIPr+2itMNaQdTtYU66tpO11zizM01laZGhqmkpzCC+KUWmCH8U0\nxic4cNfdOEKwNneJL8YtnuukrCvD2+uSscDn545O8PuXVimsYToKOBIHNFyXlUJRWBjyJC/3MhJr\nGfE8qq7LYq42J9oeiwLWlGYpy3k10WgssF1FdiHNWCk0oDncL+ld9Zo6Dq4oJe3Va6jjwoqPUaVS\n78oA4UgHfw9VdXlakKdqs79q0Ij7FkTlsPCNcuJ0NAStw3f6jN507FlwEkJ8FJi31j4phHj/xp93\n2HQnDawLvJdynakHfE4I8aS19nNbtvlB4Ie33P4j4PestZkQ4r+lzMo+cNWDWftbwG8BPPbYY7dE\nf3ulO4NWpamo599Y1XPk4bchhODS6VM4QtJrr6HznLhWxwrB0swF1pcWWDx7hvrIOFmSUB8ZoUgS\nWgemwFraK8usL86Vqq0owgsjvOmAldlZpJT4QcTDH/oIusgxWuNIybLSZdZioeoIFgpFaiz/9YEh\n5vICgeCFTsrxKCCUDo6AlcKwrjSZNhQWlDHbBgZ6jqAqBQu5xhGCuisZD7avuY16HrNZQWYM1/JJ\n9RzB3XFIYe2OmRX0yz9XrCNZa8lT1X8dvD0LGrowm49ntNmWPQ94i6CzMjBBmT0NuOXsZeb0HuBj\nQojvAELKNaffAJpCCLefPR0EZnbY9wLw59baRQAhxJ8AbwM+17/9MOBaa5/c2OGK0t9vU65t3XaK\nLGV55iJYS2145LpqPSjLgkceehvDU4dI2usYrbHCUmuNMH/mNIJSKFFkKe2lRZLOOkXSJc9Shqam\nCaoVZl95CS8Ika5Hc2ICiyCu1fHDCC+KOf7o49RHRlFFQd7rlqKIbsrpJGdIOljhkGpN4AiOxSHD\ngc/pXspMppjLCyxwXy2mrTWJNuS2HDA4l+W82Mk4WQk4FodUXYcXOgmJ0USO4EDgMexvD05Toc+5\nLMN3PJYKzeQ1Zvy5jsDdpaOrKjRFpjZvh5W9UbJ5oYvpFThS7EnJcMAbgKAG0TAsvgxBBt0lqAzf\n6bN6U7Fnwcla+4vALwL0M6e/a639ISHEp4DvpVTs/R3gD3fY/bPAzwshYiAHvhn49S33fxz4va07\nCCEmrbWz/ZsfoxRh3HaKLIN+pTFPU+KbqPQJx6E2PAICrDHUhkdLW6Jmk5FDhynSlO76Kul6G8d1\nKfIUC5x75ikWzp1FW41Vmsl77kFKjyJLWV+YozE2waEHH6bSKIUUuihKmbvn8XjDZSLIy5EWuUJb\nS0VKxgIPnRVUpEMgBa7jMNLvG2q6DpF0EBlctAUdrbmQFQSuQAPtQvHp+VW0hXurESfjqyNPIMv+\nJWUs3i1WZG0t74k9LLW5nsRtDLKltzxeBNl6f92pNghOt5g70ef0C8AnhRC/CnwN+DcAQoiPAY9Z\na3/ZWrsihPgXwFcoy35/Yq394y3H+H7gO6447v/YP4YCloEf3dunsTNhpUraaWO0pnKDrGkrwnGo\nj4xt+5uULo3RcQCGzTTt5SV6a6tkSZeZF7/BwoUzdNdWcT0Xr9FkZfYSaMXwwSniqYgAABuiSURB\nVMMMTU1vCirg8ogOgObEAYI4JrcW13GYCn3GfZcRz8V1HI5FDlOBz6M1hQImg8sZyHxWUJMOPW04\n1Uup+i5uPyicTzOEgNgRtFzJwejqsmaqDYUx9LTl8A73X4vCWDpaU5Xl8MINrLVkSYE1lrDiE1b8\nzf6mAQP2lEvPwZm+Hqv2w9ffdsCuuS3ByVr7eeDz/d9PA+/YYZtPA5/ecvt3KOXkOx3v2A5/28zU\n7iSOlFep624WozVpt1OKI65YryqD1yj1kVFmXnqBtYV5dJYT12u4boDrBbQX5gDw4gp3vfu9246R\n9Xp0lhdxXK/M0ohpui5dneMLwYjnbQ4AFEIQSkEory6LjQUeY3gcjQK6WpMYiy8cXAFneilSCCIp\neVersrlPW2kWckXNddAWXMeh7pT+eTfLqV5Kbi2+ENxTuTwAME8U7eUEay2q0NRag0bJAbeJC18p\nAxQWFl6Eg4/e6TN6UzFwiNhHrMzOoPIM4TiMTB/G2aHfo720xDe++BcYpai0hhiePkSl2WL54nl6\nayvEzRbH3/aOzZ6o9mK5TtVdW+mPe1CbkvcR36XlSRx23wskhKDqulS3/O2heoWTcYgjYHpLVnQh\nzcmtpa1LhZ4vBBa2zYC6Hsba0gQWKGzpJLhxtmVjbnmf0fvDX27AW4R0BVQXZAiqd2fPxVpYeRXS\ndahNlD9vcAbBaR9hdLmYb40pu+SvCE5FlrI8ewHH9fCCgLjR5NEPfxQsnHnmq/hRhbBSJaxWka7L\n6Sf/hjPPPEXcaDB88BBxvV4KJ7a4Vshb2J8xFXgsOIKalATOdhVfpgyuEFSl5N7q7t52jhBMhz7L\nhaLludubf2OPKPdLN4jawPNswE2StaF9CbwYGjfXNL8Na2HoWHkMR8LU49feNu9BugZhA/w9yuxV\nVj4GQGd+EJwG3FoaY+P01tbw43ib7ZG1lvnTr7A0c55Ks0VjZAysZXj6CL3VVeqjY0wcO7m5PtWa\nPEBvfY0zzzxF0lknS7pMHD9JY3wSP4zww4jFXNFWmmHfpX4T85Vu6vw996q+JWst2lraurQ1cl+j\nUKHlubR2yLSEEINS3oDds3YRVAJ5pwwaQfXG+2ygcli7APVJOPJumHgEJh/YeVtrYekUWA3dBZh4\ncG8adt2gDLRFD6KbX+vezwyC0z7Cj2L86OoL7cqlGS6dOUVvdRVrDIcfegTX90nW18l6XZL2OrXh\nEdwgQLoejnQp0jWqw8NYUzYHT91zP15QZha5MVzMSjvDXmq4vxrt2XNKjCXRhshx6PSnyHb67g/V\nHYJi2f8kbmlGN2DAVbhBGZyEAzusrV6TdA2WXy1/AIZOwsiJGwQce8W/e4AQMHIX6ALc/WMG/HoY\nBKc3AMKC5wf4UYT0PKpDw6j8sleu9DyE4xDXG6zMXuTSKy8jpcv0/Q+gC8XIoaObgQnKUt6GA4P3\nOoPAQlbwXDeh6UoerMVXBRVfwFymaGvNsThgKVdc6AfGQ6G/LRu6lBXM5QWeEJyMw22qvJulyBTW\nghfsP1frAfuI1pEy0HjR7i7meRewpXy86JVlOvc6X+6EgKHjkKyUGc3Ge1IXZSblhhAPvZ5nsv2x\n3iSBCQbB6Q1Bc2KSoshZunCB+sgo3dUVWhMHcD0PIRy88HI/0dKF83RXVnADn9rIKI3RcbwrhrZJ\nITgRB3S1of46TTafavdYU5rFXDEZ+Fc5QuQWJkKPMesSOA7ZFoVedsWAvHWlsdayUCiq0uHQLqTm\nUAamLNkYR2/xwzfnqIwBtwAhXlv5Kx6Gmadh9hmoT5XBRdygXy+oXl02XD1X9khBmcX5lav3e4sz\nCE5vACwWrMEaRdJeR7ou1tptJcAiK217EA5aK1Q3p8gyemur6KKgOTGJtXYzmwgcZ5to4bUSS4e5\nrMCXgsoOVkORI6hLSUdrRn2XpuuSW4sARq5YQxrzXZ5uF3S0YblQ1Fy54zrTtdga6uxAuDfgVpO1\ny6A0+1QZbFQCfvVNla3sJwbBaR9hrWXl0gydpSUqrRatyQM4jkTnZZNpEFco8oyo3tgMMtZailSR\n9217qkPjxPU6RhVIzy/95jLD2kIHBHiBR1T1b1nJa8iVPKsUDpKdlNxCCI7G2zOgI9fIiJqey4k4\nYLEon8tuleFaCs4YhUBwYp+M+B7wJqIzX5bzvLAs78XD5XrTzZB1YfmV0kmieRiahy6X9QZZ044M\ngtM+oru6wsLZV9H9QYKVZouwUsUNAsJqFUdKqkPDRLVyNJYxlqSdUeRl1uQFpaOE1ilBHKHynDxJ\nEU5MlioEIF2J0RZ5LdfVXXIxy5nNFeczxfF2j3cO1W6803WY6JcFpRAMbzF2XcwVylpGffeaYokl\npdH9fVaUYWIwF2jA60VrePmzZf/Q2D0QtmD0PqiOwug9NycNtxbO/3UZjLxKqQ6Mh6B+YO/P/w3M\nIDjtI6wxBFFMr1hDCIEXlGtJQggaY1f3LRhtsNbiemXA8QMXpQwQoHJBVK0RxFWSdobrWqwtR0w4\n8tYJBWJHkttS+NDehePDtZBCMBVuz3pWC7WpLiysZTrcOSuqSIelomzQ3anEOGDArjn31/DKn4EM\nynWq+78bhAS5i0un0VsczHu7Uwe+hRkEpz3GaEOeqXLGUHD9l7vSamGtpTU5RW14ZEeHiK1sDNIz\n2hLXfKTnkK+V9v26Ly5wHEFcCwgrPo689cPx7qtGXMhy1pVi+AbrQ12tSbSl5ck9kYq3PBdfCLp9\nh3Xo91kV5WDAgYP4gF2T9Rtbi25ZgnN32ejdW4LeWqnYC+pQndhdT9VbmEFw2mOyXtG3DdJIef2p\nrI4jqY+M3vSxhRBE1e0fFteXqFzj+pel1MIRyD1y6W76LmO+S6INF7Kc+3REsEPWkmrDK70MC7SV\nvGod6rqP4bkU1qItjPrXf8vOZAU9Y1goNPdWQopUbY7RiKrBIEANuHl0AY4L9YMQNuGuD9/8vkbD\nq38JX/9UKSMfPgGPfBzSVZjpy8pbR/bs1N8MDILTXrM1Q7gNbTdh7MNtNkzoasOZJOOFTsIj1ZDp\neHvfR2YML3RTziYZk4FH/BpUgqP+zcnCN6TqylqUtaWCsY8xhtJJcMCAmyBrQ2UUph8rxQ/eNYaP\n7USyBpeehZWzoFOQHlx8Ahy/P1rDlvOggupgxPs1GHxS95gw9vBDj7DiI6WD1obuWkp3LUWr179G\nsx/QxvKNbsqlQvGnS+2r7l8tNI6ApiexwMFrrBndCqZDn4p0OBB4+I6DH7pIV+L5Lq43EEgM2AVB\nrZSJy6AMJLvBj6E6Ds2D4NegcRhqk/2SnoDeSmlrtHRq0PdwDQaZ0x4jHIEfXn6ZVa43nbTzTBG9\nCXokRgOPYU/S0YZ235poK1XpIChdyA+H/jVHr98KrvT3c6RDVH3jv8YD7gDSKxtsL34V5p6FB74f\nwh1k31mnzLKi1uXsygvhrm+DqbeDF5QKvSIp7wtqMP9CWUlZnymzq6hZbjvIojYZBKfbhDGWrJuj\nlMEoTZ5pfG1xXQfvBkKJ/c7j9Zg/WXBRpsAToIzdZvBacSX3ViIsFv8WT7/dif+/vXOPsquq7/jn\ne869dx5JJtEEAiEJgaWhCALFAY1iS9VaiwhY0GJZoi7U+q62tmrXqgutbW1dFavYUnyAurS+qK80\nNqtEsOIjEDA8gkAQEBLKI5D3ZObee86vf+x9JzeTmck87p17c+f3WeusOWe/zvnlZp/f2Xv/9u/X\nbMOLwwEzY3BvhTzLKXYXDmmM44zBo7+ELTcHr99JCU6/5MD8rBr2L1ke1pYWP2d/Xs/8cNSoN4SY\ntziYlm9/BLb9Kuyb2vsknPCHzZXnMKLpbwpJqaRfSlodr4+TtF7SZknfkDTqZ62kUyT9XNImSXdK\n6o7pN0q6V9LGeBwZ07tie/fH9lc0W7bJUC1nZFkePowSkRYUA+Qd/lN7SZKwolTitLREYSBj12Dl\noDLFRDOimGqGF1uHyjy8r3zoCh1KntmwxWbNIMSZAqX5QQFh8NidYcQzktq0nE2iL/ctgaNPhbmL\ngmIa2g1bb4WhFseFaiNmYs3pz4D6X/QfgSvM7NnAduCykRUkFQhRcN9mZicBZwP1b7xLzOy0eDwR\n0y4DtpvZs4Ar4n3ahnorMVkw5qkMZQ3bDNsIqpWMfXvKVMqTe5klEr/d28MRacrxpSJUDp7amymy\nGIwQglHEbCVJheLotdCgkCizkpV/ACecE8zAB3fALV+Ex+teZ2kBnnkc9C4K8Z0my/G/Fyz5lIIJ\ntqxv3LMf5jR1rC9pKfBK4O+AP1ewbX4J8CexyJeAy4F/G1H15cAdZnY7gJk9NYHbnR/bAvg2cKUk\nmbXHGyotJPT2dYMZQwMVkqis0hlYpA9rXJCMY04epoHCSCOrZhSKY3v1rlYyBveWSZKwnqNEnDiv\nh3kDFUpKKLXwX3xOIWVJV5GBLOfICVr4dSJS2N9muZH4huSpUyjAqa+F8u6w7kQOOx6CxSfuL9M9\nPxwTIc9CgMLKvnCepLDs+SHc+xP3wq3fgYHNwAJ45/VwxLObINThQbMnoj8F/BVQ82mzENhhZrVP\n8y3AaGEoVwImaS1wBPB1M/unuvxrJGXAdcDHogI6BngEwMyqknbG+22rb1jSW4G3Aixfvnz6Ek6C\noBxE15wSlaFq2ETb5BeH5cbAniEsD166640z6pHCl7bldsiNurVpojzPqVZziqWUUinlqPnBhLzV\ne4kmanbe6UhCDfQGMmuR4MRXhXWnYi8847igWLLK5MzLswo8vgkevQO2Pxym8la+DLbeAvesgfKO\nusI74LP9cOkPYMlp0N3XcLHanaYpJ0nnAk+Y2a2Szq4lj1J0tO/sAnAWcAYwAKyTdKuZrSNM6W2V\nNI+gnF4PfHmibZvZ1cDVAP39/VP+xq/38D1ZkkR09czMCzTL8uG9PtVKNqZygrBJNatkpIVkXNkK\nxZSsmiMFrwtmBrlR6ikytLeMZUae5f7F7nQO8xbDmW8Jjl8LXXD/uqBo5i+D5144MSu7ygDcuRp+\n/on9aRuugnkrRiimOtZfBUv6gy+/7mcGd0rH9MMp5zdErHammSOnFwHnSToH6Ab6CCOpBZIKcfS0\nFHh0lLpbgB+b2TYASWuA04F1ZrYVwMx2S/oacCZBOW0BlgFb4prVfODpZgg2uLdMtZKRFtK2N1NO\nCwlJmgSrra7xpxCTRKiUUhmsoiQf04qw2FUgLaZI4et8YNfQ8CiqUEjIzaiUM7p6XDk5HUShFI59\n2+GxO0Ko9m2bg8Ja9vzxp+B2PQo3X3ugYqqxeyvhVTzKWu9vfgEP/TQ4nN39GOTRHP0/gct3Tl+m\nNqZpbw8z+5CZLTWzFcDFwI/M7BLgBuCiWOwNwPdGqb4WOEVSb1Q0vwvcLakgaRGApCJwLnBXrPP9\n2B6x/R81Y70pWNiFBf+smh3ggaAdqa09zF3QQ3EM1z+DA2X27hocDtZXjn+r4xg2JEnw02dm5NEr\ngwj/PkDTpysdp2V0zQ/KwjIoD8C9/w2r3wur/yJM2T34E7h7DWz6AWz7NdzzQ/j+B+GmMWy0ijUT\n8wSSebDy3P15g0/B0I6gCPOhA+tdPsF1rsOUVmx++ADwdUkfA34JfAFA0nlAv5l92My2S/okcAth\nam6Nmf2XpDnA2qiYUuB64HOx3S8AX5F0P2HEdHEzHl7Sfv91xXTYIupwo1rJKA9WkRj2VFEerJIW\nJ6dUJFHqLlItVynN7aJQTDBcOTkdTJLA894AD58ID90ULOyyoRDd9qf/EuI+PbU5bKyduxjyKmy9\nfYzG5kAhDbbIha6wpnXf6lHKVWDuMtjzSBMFay/UJsZsLaG/v982bNgwpbrTWXNqFXmeM7B7aH+g\nwtziyCdEpi11F+nqLcZpPU15c3BWzcPIahzFXR6sUq1kFLsKFEtu6uwcpuzZBhs+D1s3Bp95WRme\nfjB6fegLo6xiDzz9G9g9imKZc1RYryrvgUJPWE/avObgcr3L4Y8/B9eMcD7boqm9aAPQ38x7+Lbx\nKXK4KSaAvTsH2bcnmIsXS2lQPhZlseBqSRKlaRhrDA1UqJSrw9OJoymoPDfKcaPu0ECZYqnnoDKO\nc1gwdxGc/cHgK686BE8/APeuDv70uhfAirPCOtVTm+G278DgwwfW3/sY0AcLjoClZ8DyVaMrp/dt\nCGbnx58PD8SVkL9pypJ62+DKaZZhZuSVnN65JbrnlMiqOUP7QliPrJLBNK0Is+hbzyxY7KXJwaMi\niWGzdbfoczqC3meEv31HwYoXjl7m5Avh6t8ZJWMXcGTw27fkVLjgC/C/n4G+Z8F5H4FnLt1f9NIv\nN/rJ2xZXTrOI3r4u9u7YRzXLGdxXpXd+D2khYe/OQfIsp5om056uLHUXGRqskKbJ8EbjkdTiUOVZ\n3vI9UY4zYxx1MrxpLVzzSg6yzCt2weLnBt98S0+H0y4atYnZhL8ZZhFJklAoFenqKZJnOVk1Qwp7\nrmqRcofdhEVPFrufHmBg18TDexRKKXP6uumeUxpXySWJxvVC4TgdR5LCsjPgFaOYk696N5zymslt\n6u1wfOQ0iwgjlhI7t+2lWEqHp9S6ax4riumwi6PadN/gQJk0DRZ4c/q84zjOtEhSOPlV8Ixl8ODP\nYMnpcPIrgwWgcwCunGYZhWJK38IQKjer7A8dn46IKyVpeNO7EpH4CMdxGsOcRXD0SXDUiSEAoSum\nUXHlNAvIs3zYEi8pJBAjSYxnjFBzVFvqKSIx5gZex3EmiRRCZjjj4m+cDqccPT5IomdeF8VSSpp2\nAeMrJyCOqPyrznGcmceVU4dTjYYMNdPuJEmnbb5d26cUvEMU3KjBcZyG45/FHU5NeTRyFFTeV6Fa\nzqgMVRsSybfmqcJxHKeGj5w6nEIxpTC/se6B6r0+THfQVAtcmGdGV2/RR2KO4wCunJwpUOouDPvO\nK0wzkm+1kpFnxuBAOXqX6J6xWFfNpjJUJctySl0F94ThOJPEe4wzaaTgFHa6igmiFaAZSSLSYog7\n1QnU9olVyxmDA5VWP47jHHb4yMlpKWkhYd7CXsr7quR5Pi2ns+1E/cykz1I6zuSZ1SEzJD0J/KbV\nz1HHImBbqx9iBnF5O5fZJCvMPnlPMLN5zbzBrB45mdkRrX6GeiRtaHaMlHbC5e1cZpOsMDvlbfY9\nfM3JcRzHaTtcOTmO4zhthyun9uLqVj/ADOPydi6zSVZweRvOrDaIcBzHcdoTHzk5juM4bYcrJ8dx\nHKftcOU0A0j6hqSN8XhI0sYR+csl7ZH0/jHqHydpvaTNsa1STO+K1/fH/BXNl2Z8xpJV0pl16bdL\nevUY9V8i6TZJd0n6kqRCTD9b0s66Nj48k3KNRRPllaRPx9/2Dkmnz6RcY9EAeV8a5d0o6SZJz4rp\nb5T0ZF0bb55JucaiifJ2Yt/9SV25RyV9N6ZPre+amR8zeAD/DHx4RNp1wLeA949R55vAxfH8KuDt\n8fwdwFXx/GLgG62WbyxZgV6gEM+PBp6oXdeVT4BHgJXx+qPAZfH8bGB1q2WaQXnPAX4ICHgBsL7V\n8k1X3ph3H3BiPH8HcG08fyNwZatlmkF5O6rvjlL/OuDSeD6lvusjpxlEkoDXAv9Rl3YB8ACwaZw6\nLwG+HZO+BFwQz8+P18T8l8byLWekrGY2YGbVmN0NjGaJsxAYMrP74vX/ABc2+1kbQRPkPR/4sgV+\nASyQdHTTBJgkU5SXmN4Xz+cDjzbzORtFE+TttL5bX38e4Z313ek8hyunmeXFwONmthlA0hzgA8BH\nxqmzENhR959jC3BMPD+G8OVNzN8Zy7cDB8gKIOn5kjYBdwJvq5OpxjagKKm20/4iYFld/qo4rfBD\nSSc18+GnQKPlHf5tI/W/ezswFXkB3gyskbQFeD3w8bq8C+MU5rclLRulbitptLyd1nfreTWwzsx2\n1aVNuu+6cmoQkq6P6wYjj/Prir2OulETQSldYWZ7xmt6lDSbQF7TmKKsmNl6MzsJOAP4kKTuEflG\nmOK4QtLNwG6g1gluA441s1OBzzDNr7LJ0CJ5W/LbQvPkjbwPOMfMlgLXAJ+M6T8AVpjZKcD17B9V\nNJ0WydtRfXcEI+tPre+2em5zthwEP4aPA0vr0n4CPBSPHcDTwLtG1BPhC7s257sKWBvP1wKr6trf\nRty71m6yjlLmBqD/EO28HPjmGHkPAYtaLWuz5AX+HXhdXd69wNGtlnU68gJHAL+uu14O3D1K3RTY\n2Wo5mylvp/ZdwujvKaB7nPoT6rs+cpo5XgbcY2Zbaglm9mIzW2FmK4BPAX9vZlfWV7Lwa95AmPIB\neAPwvXj+/XhNzP9RLN9qDpJVweKwZol2LHAC4T/pAUg6Mv7tIkx5XhWvj6rNyUs6kzDqf6q5YkyY\nhstL+G0vVeAFhJf1/zVViokzVXm3A/MlrYzXvw/8KtapX087r5beJjRcXjqw70ZeQzB+GKyrP7W+\n22pNPVsO4FrCXO1Y+ZdTZ60HrAGWxPPjgZuB+wlWfV0xvTte3x/zj2+1nGPJSphv3wRsJAzzLxhD\n1k8QOvC9wHvryrwr1r8d+AXwwlbL2WR5BXwW+DVhnn/cUddhJO+rozy3AzfW/s8C/1D3+94A/Far\n5WyyvB3Xd+P1jcArRtSfUt9190WO4zhO2+HTeo7jOE7b4crJcRzHaTtcOTmO4zhthysnx3Ecp+1w\n5eQ4juO0Ha6cHGeKSBrPs0cj2v+8pOfE87+eQv0Vku5q/JM5TvNxU3LHmSKS9pjZ3Ha9l0IYhtVm\ndnJTHspxmoiPnByngUg6VtK66MB0naTlMf1ahfhMP5P0gKSLYnoi6V8lbZK0WtKaurwbJfVL+jjQ\noxAL56sjR0SS3i/p8nj+vOhg8+fAO+vKpJI+IemW+Gx/OoP/LI4zaVw5OU5juZIQ6uIU4KvAp+vy\njgbOAs5lv3fqPwJWAM8leLBeNbJBM/sgsM/MTjOzSw5x/2uA95jZyHYuI7hAOoPgvPMtko6bjGCO\nM5O4cnKcxrIK+Fo8/wpBGdX4rpnlZnY3sDimnQV8K6Y/RnDdMyUkzQcWmNmP6+5f4+UEX30bgfUE\nB53Pnuq9HKfZFFr9AI7T4dQv6g7VnWvE38lQ5cAPy1r4AjF22AUB7zaztVO4n+PMOD5ycpzG8jNC\njCaAS4CbDlH+JkKQvUTSYkJI69GoSCrG88eBIyUtjN7MzwUwsx3ATkm10Vr9FOBa4O21NiStVAh2\n6ThtiY+cHGfq9CpEOK3xSeA9wBcl/SXwJPCmQ7RxHfBS4C7gPsKU285Ryl0N3CHpNjO7RNJHY9kH\ngXvqyr0p3n+AoJBqfJ6wtnVbDF/wJHDBhKR0nBbgpuSO02IkzTWzPZIWEsInvCiuPznOrMVHTo7T\nelZLWgCUgL91xeQ4PnJyHMdx2hA3iHAcx3HaDldOjuM4TtvhyslxHMdpO1w5OY7jOG2HKyfHcRyn\n7fh/JQpnu8eUqsgAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Visualising the clusters on a map\n", "def plot_clusters(frame):\n", " city_long_border = (-74.03, -73.75) # X-axis limits\n", " city_lat_border = (40.63, 40.85) # Y-axis limits\n", " fig, ax = plt.subplots(ncols=1, nrows=1)\n", " \n", " # Create a scatter plot of first 100000 data points with longitude on x-axis and latitude on y-axis\n", " # Parameter 'c' => Points belonging to the same cluster must have the same color\n", " # Parameter 'lw' => lw=0 means linewidth is 0.\n", " ax.scatter(frame.pickup_longitude.values[:100000], frame.pickup_latitude.values[:100000], s=10, lw=0,\n", " c=frame.pickup_cluster.values[:100000], cmap='tab20', alpha=0.2)\n", " ax.set_xlim(city_long_border)\n", " ax.set_ylim(city_lat_border)\n", " ax.set_xlabel('Longitude')\n", " ax.set_ylabel('Latitude')\n", " plt.show()\n", "\n", "plot_clusters(frame_with_durations_outliers_removed)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Time-binning\n", "* Since we want to predict the no. of pickups at a given timestamp, we divide the time into bins(intervals) of 10 minutes and allocate each pickup to particular bin.\n", "* Note: Selecting the size of the bin (10 mins) depends on the individual and is not a rule\n", "* Currently we have the data for the month of Jan-2015, so by using the unix timestamp we will divide the time.\n", "* No. of minutes in the month of Jan = 24\\*60\\*31 = 44640 (since we have 31 days in Jan)\n", "* So if we divide the time into 10 min intervals, we will have 44640/10 = **4464* bins (for the month on Jan 2015 only)\n", "\n", "Get the unix timestamps of all the pickups" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1421328939.0, 1420902218.0, 1420902218.0, 1420902219.0, 1420902219.0]" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unix_pickup_times=[i for i in frame_with_durations_outliers_removed['pickup_times'].values]\n", "unix_pickup_times[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Binning procedure\n", "* Refer https://www.unixtimestamp.com/ to get the unix timestamp of 01 Jan 2015 (00:00:00).\n", "* It is **1420070400**\n", "* The idea is: First 10 minutes from 1420070400 fall in bin 1, second 10 minutes from 1420070400 fall in bin 2 and so on..\n", "* So we first subtract 1420070400 from every timestamp of the dataset.\n", "* **Note:** Since the timestamps are in seconds, we also divide each value by 600 (60*10), so that each entry then belongs to a particular 10 minute bin" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": true }, "outputs": [], "source": [ "tenminutewise_binned_unix_pickup_times = [int((i-1420070400)/600) for i in unix_pickup_times]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ideally all the values must lie from 0 to 4463, let's check if it is the case:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Min = -33, Max = 4430\n" ] } ], "source": [ "print(\"Min = {}, Max = {}\".format(min(tenminutewise_binned_unix_pickup_times), max(tenminutewise_binned_unix_pickup_times)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The minimum bin is -33 and not 0. What can be the reason?\n", "\n", "**Notice** that the unix timestamp we got from https://www.unixtimestamp.com/index.php assumes the input time in UTC/GMT (Greenwich Mean Time). However New York falls under EST (Eastern Standard Time).\n", "\n", "So may be because of the time difference we may have to add a value of 33 to each of the timestamp.\n", "\n", "Refer https://www.timeanddate.com/time/zones/est for EST timezone" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Min = 0, Max = 4463\n" ] } ], "source": [ "tenminutewise_binned_unix_pickup_times = [i+33 for i in tenminutewise_binned_unix_pickup_times]\n", "print(\"Min = {}, Max = {}\".format(min(tenminutewise_binned_unix_pickup_times), max(tenminutewise_binned_unix_pickup_times)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now this seems good, as the time bins are from 0 to 4463, i.e. total of 4464 bins.\n", "\n", "Finally add a column of pickup_bins in the dataframe." ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "collapsed": true }, "outputs": [], "source": [ "frame_with_durations_outliers_removed['pickup_bins'] = np.array(tenminutewise_binned_unix_pickup_times)" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['passenger_count', 'trip_distance', 'pickup_longitude',\n", " 'pickup_latitude', 'dropoff_longitude', 'dropoff_latitude',\n", " 'total_amount', 'trip_times', 'pickup_times', 'Speed', 'pickup_cluster',\n", " 'pickup_bins'],\n", " dtype='object')" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "frame_with_durations_outliers_removed.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's rename the dataframe" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": true }, "outputs": [], "source": [ "jan_2015_frame = frame_with_durations_outliers_removed.copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Delete the old dataframe and unwanted data sutructures by uncommenting and executing below cell:" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "collapsed": true }, "outputs": [], "source": [ "'''\n", "del frame_with_durations_outliers_removed\n", "del unix_pickup_times\n", "del tenminutewise_binned_unix_pickup_times\n", "'''" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
passenger_counttrip_distancepickup_longitudepickup_latitudedropoff_longitudedropoff_latitudetotal_amounttrip_timespickup_timesSpeedpickup_clusterpickup_bins
011.59-73.99389640.750111-73.97478540.75061817.0518.0500001.421329e+095.285319342130
113.30-74.00164840.724243-73.99441540.75910917.8019.8333331.420902e+099.98319321419
211.80-73.96334140.802788-73.95182040.82441310.8010.0500001.420902e+0910.746269161419
310.50-74.00908740.713818-74.00432640.7199864.801.8666671.420902e+0916.071429381419
413.00-73.97117640.762428-74.00418140.74265316.3019.3166671.420902e+099.318378221419
\n", "
" ], "text/plain": [ " passenger_count trip_distance pickup_longitude pickup_latitude \\\n", "0 1 1.59 -73.993896 40.750111 \n", "1 1 3.30 -74.001648 40.724243 \n", "2 1 1.80 -73.963341 40.802788 \n", "3 1 0.50 -74.009087 40.713818 \n", "4 1 3.00 -73.971176 40.762428 \n", "\n", " dropoff_longitude dropoff_latitude total_amount trip_times \\\n", "0 -73.974785 40.750618 17.05 18.050000 \n", "1 -73.994415 40.759109 17.80 19.833333 \n", "2 -73.951820 40.824413 10.80 10.050000 \n", "3 -74.004326 40.719986 4.80 1.866667 \n", "4 -74.004181 40.742653 16.30 19.316667 \n", "\n", " pickup_times Speed pickup_cluster pickup_bins \n", "0 1.421329e+09 5.285319 34 2130 \n", "1 1.420902e+09 9.983193 2 1419 \n", "2 1.420902e+09 10.746269 16 1419 \n", "3 1.420902e+09 16.071429 38 1419 \n", "4 1.420902e+09 9.318378 22 1419 " ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jan_2015_frame.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** Moving forward the two newly added columns in the above dataframe will be very useful:\n", "* pickup_cluster => The cluster (region) to which a pickup belongs.\n", "* pickup_bins => The time bin (time interval) to which a pickup belongs.\n", "\n", "Let's write a general function which does all the above steps, so that we can use this function to prepare the data for other months." ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Refer:https://www.unixtimestamp.com/\n", "\n", "# 1420070400 : 2015-01-01 00:00:00 \n", "# 1422748800 : 2015-02-01 00:00:00 \n", "# 1425168000 : 2015-03-01 00:00:00\n", "# 1427846400 : 2015-04-01 00:00:00 \n", "# 1430438400 : 2015-05-01 00:00:00 \n", "# 1433116800 : 2015-06-01 00:00:00\n", "\n", "# 1451606400 : 2016-01-01 00:00:00 \n", "# 1454284800 : 2016-02-01 00:00:00 \n", "# 1456790400 : 2016-03-01 00:00:00\n", "# 1459468800 : 2016-04-01 00:00:00 \n", "# 1462060800 : 2016-05-01 00:00:00 \n", "# 1464739200 : 2016-06-01 00:00:00\n", "\n", "def add_pickup_bins(frame,month,year):\n", " unix_pickup_times=[i for i in frame['pickup_times'].values]\n", " ## Below are unix timestamps for Jan 01 2015, Feb 01 2015, and so on..\n", " unix_times = [[1420070400,1422748800,1425168000,1427846400,1430438400,1433116800],\\\n", " [1451606400,1454284800,1456790400,1459468800,1462060800,1464739200]]\n", " \n", " start_pickup_unix=unix_times[year-2015][month-1]\n", " tenminutewise_binned_unix_pickup_times=[(int((i-start_pickup_unix)/600)+33) for i in unix_pickup_times]\n", " frame['pickup_bins'] = np.array(tenminutewise_binned_unix_pickup_times)\n", " return frame" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's calculate the number of pickups that happend in a particular cluster, in a given 10-min interval for Jan 2015" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
no_of_trips
pickup_clusterpickup_bins
01105
2199
3208
4141
5155
\n", "
" ], "text/plain": [ " no_of_trips\n", "pickup_cluster pickup_bins \n", "0 1 105\n", " 2 199\n", " 3 208\n", " 4 141\n", " 5 155" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jan_2015_groupby = jan_2015_frame.groupby(['pickup_cluster','pickup_bins']).trip_distance.agg({'no_of_trips': 'count'})\n", "jan_2015_groupby.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note that dataframe has two indices**\n", "* primary index: pickup_cluster (cluster number)\n", "* secondary index : pickup_bins (we divide whole month time into 10-min intravels 24*31*60/10 = 4464 bins)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Explanation:\n", "In cluster 0, there were 105 pickups in bin 1;
\n", "In cluster 0, there were 199 pickups in bin 2; and so on..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preparing the data for Jan 2016 - test data\n", "**Note:** For cleaning and preparing the test (Jan 2016) data, we use the same parameters that we found while analysing the train (Jan 2015) data. We must **not** perform the analysis, cleaning, processing, etc. on the test data separately because that is not possible after the model is deployed." ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Return with trip times..\n", "Remove outliers..\n", "Number of pickup records = 10906858\n", "Number of outlier coordinates lying outside NY boundaries: 214677\n", "Number of outliers from trip times analysis: 27190\n", "Number of outliers from trip distance analysis: 79742\n", "Number of outliers from speed analysis: 21047\n", "Number of outliers from fare analysis: 4991\n", "Total outliers removed 297784\n", "---\n", "Estimating clusters..\n", "Final groupbying..\n" ] } ], "source": [ "# Uptil now we cleaned data and prepared data for the month Jan 2015.\n", "# Now do the same operations for month of Jan 2016 which we will use for testing purpose.\n", "\n", "# Steps:\n", "# 1. get the dataframe which inlcudes only required colums\n", "# 2. adding trip times, speed, unix time stamp of pickup_time\n", "# 4. remove the outliers based on trip_times, speed, trip_duration, total_amount\n", "# 5. add pickup_cluster to each data point (using kmeans clustering)\n", "# 6. add pickup_bin (index of 10 min interval to which that trip belongs to)\n", "# 7. group by data, based on 'pickup_cluster' and 'pickup_bin'\n", "\n", "# Data Preparation for the month of Jan 2016\n", "def datapreparation(month,kmeans,month_no,year_no):\n", " \n", " print (\"Return with trip times..\")\n", "\n", " frame_with_durations = return_with_trip_times(month)\n", " \n", " print (\"Remove outliers..\")\n", " frame_with_durations_outliers_removed = remove_outliers(frame_with_durations)\n", " \n", " print (\"Estimating clusters..\")\n", " frame_with_durations_outliers_removed['pickup_cluster'] = kmeans.predict(frame_with_durations_outliers_removed[['pickup_latitude', 'pickup_longitude']])\n", "\n", " print (\"Final groupbying..\")\n", " final_updated_frame = add_pickup_bins(frame_with_durations_outliers_removed,month_no,year_no)\n", " final_groupby_frame = final_updated_frame.groupby(['pickup_cluster','pickup_bins']).trip_distance.agg({'no_of_trips': 'count'})\n", " \n", " return final_updated_frame,final_groupby_frame\n", " \n", "month_jan_2016 = dd.read_csv('C:\\\\Users\\\\HARSHALL\\\\Desktop\\\\Harshall\\\\Courses\\\\Applied AI\\\\Case Studies\\\\Taxi Demand Prediction\\\\Data\\\\yellow_tripdata_2016-01.csv')\n", "\n", "jan_2016_frame,jan_2016_groupby = datapreparation(month_jan_2016,kmeans,1,2016)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Saving dataframes for future references**" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "collapsed": true }, "outputs": [], "source": [ "jan_2015_frame.to_pickle(\"Save/jan_2015_frame\")\n", "jan_2016_frame.to_pickle(\"Save/jan_2016_frame\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Read the files by executing below cell**" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "jan_2015_frame = pd.read_pickle(\"Save/jan_2015_frame\")\n", "jan_2016_frame = pd.read_pickle(\"Save/jan_2016_frame\")" ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
passenger_counttrip_distancepickup_longitudepickup_latitudedropoff_longitudedropoff_latitudetotal_amounttrip_timespickup_timesSpeedpickup_clusterpickup_bins
525.52-73.98011840.743050-73.91349040.76314220.318.501.451587e+0917.902703280
627.45-73.99405740.719990-73.96636240.78987127.326.751.451587e+0916.710280240
711.20-73.97942440.744614-73.99203540.75394410.311.901.451587e+096.050420211
816.00-73.94715140.791046-73.92076940.86557819.311.201.451587e+0932.142857101
913.21-73.99834440.723896-73.99585040.68840012.811.101.451587e+0917.351351241
\n", "
" ], "text/plain": [ " passenger_count trip_distance pickup_longitude pickup_latitude \\\n", "5 2 5.52 -73.980118 40.743050 \n", "6 2 7.45 -73.994057 40.719990 \n", "7 1 1.20 -73.979424 40.744614 \n", "8 1 6.00 -73.947151 40.791046 \n", "9 1 3.21 -73.998344 40.723896 \n", "\n", " dropoff_longitude dropoff_latitude total_amount trip_times \\\n", "5 -73.913490 40.763142 20.3 18.50 \n", "6 -73.966362 40.789871 27.3 26.75 \n", "7 -73.992035 40.753944 10.3 11.90 \n", "8 -73.920769 40.865578 19.3 11.20 \n", "9 -73.995850 40.688400 12.8 11.10 \n", "\n", " pickup_times Speed pickup_cluster pickup_bins \n", "5 1.451587e+09 17.902703 28 0 \n", "6 1.451587e+09 16.710280 24 0 \n", "7 1.451587e+09 6.050420 21 1 \n", "8 1.451587e+09 32.142857 10 1 \n", "9 1.451587e+09 17.351351 24 1 " ] }, "execution_count": 66, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jan_2016_frame.head()" ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
no_of_trips
pickup_clusterpickup_bins
0163
2217
3189
4137
5135
\n", "
" ], "text/plain": [ " no_of_trips\n", "pickup_cluster pickup_bins \n", "0 1 63\n", " 2 217\n", " 3 189\n", " 4 137\n", " 5 135" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "jan_2016_groupby.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Smoothing" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Gets the unique bins where pickup values are present for each reigion\n", "\n", "# for each cluster region we will collect all the indices of 10-min intervals in which the pickups happened \n", "# we got an observation that there are some pickup_bins that do not have any pickups\n", "def return_unq_pickup_bins(frame):\n", " values = []\n", " for i in range(0,40):\n", " new = frame[frame['pickup_cluster'] == i]\n", " list_unq = list(set(new['pickup_bins']))\n", " list_unq.sort()\n", " values.append(list_unq)\n", " return values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### For every month we get all indices of 10-min intervals in which atleast one pickup got happened" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#jan\n", "jan_2015_unique = return_unq_pickup_bins(jan_2015_frame)\n", "jan_2016_unique = return_unq_pickup_bins(jan_2016_frame)" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Total no. of bins (across all clusters) with zero pickups in Jan 2015 = 8201\n" ] } ], "source": [ "total = 0\n", "for i in range(40):\n", " total += (4464 - len(jan_2015_unique[i]))\n", "print(\"Total no. of bins (across all clusters) with zero pickups in Jan 2015 = \",total)" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Below is the list of no. of zero pickups in each cluster of Jan 2015\n", "**********************************************************************\n", "**********************************************************************\n", "For 0 th cluster, number of 10 min intervals with zero pickups: 41\n", "------------------------------------------------------------\n", "For 1 th cluster, number of 10 min intervals with zero pickups: 1986\n", "------------------------------------------------------------\n", "For 2 th cluster, number of 10 min intervals with zero pickups: 30\n", "------------------------------------------------------------\n", "For 3 th cluster, number of 10 min intervals with zero pickups: 355\n", "------------------------------------------------------------\n", "For 4 th cluster, number of 10 min intervals with zero pickups: 38\n", "------------------------------------------------------------\n", "For 5 th cluster, number of 10 min intervals with zero pickups: 154\n", "------------------------------------------------------------\n", "For 6 th cluster, number of 10 min intervals with zero pickups: 35\n", "------------------------------------------------------------\n", "For 7 th cluster, number of 10 min intervals with zero pickups: 34\n", "------------------------------------------------------------\n", "For 8 th cluster, number of 10 min intervals with zero pickups: 118\n", "------------------------------------------------------------\n", "For 9 th cluster, number of 10 min intervals with zero pickups: 41\n", "------------------------------------------------------------\n", "For 10 th cluster, number of 10 min intervals with zero pickups: 26\n", "------------------------------------------------------------\n", "For 11 th cluster, number of 10 min intervals with zero pickups: 45\n", "------------------------------------------------------------\n", "For 12 th cluster, number of 10 min intervals with zero pickups: 43\n", "------------------------------------------------------------\n", "For 13 th cluster, number of 10 min intervals with zero pickups: 29\n", "------------------------------------------------------------\n", "For 14 th cluster, number of 10 min intervals with zero pickups: 27\n", "------------------------------------------------------------\n", "For 15 th cluster, number of 10 min intervals with zero pickups: 32\n", "------------------------------------------------------------\n", "For 16 th cluster, number of 10 min intervals with zero pickups: 41\n", "------------------------------------------------------------\n", "For 17 th cluster, number of 10 min intervals with zero pickups: 59\n", "------------------------------------------------------------\n", "For 18 th cluster, number of 10 min intervals with zero pickups: 1191\n", "------------------------------------------------------------\n", "For 19 th cluster, number of 10 min intervals with zero pickups: 1358\n", "------------------------------------------------------------\n", "For 20 th cluster, number of 10 min intervals with zero pickups: 54\n", "------------------------------------------------------------\n", "For 21 th cluster, number of 10 min intervals with zero pickups: 30\n", "------------------------------------------------------------\n", "For 22 th cluster, number of 10 min intervals with zero pickups: 30\n", "------------------------------------------------------------\n", "For 23 th cluster, number of 10 min intervals with zero pickups: 164\n", "------------------------------------------------------------\n", "For 24 th cluster, number of 10 min intervals with zero pickups: 36\n", "------------------------------------------------------------\n", "For 25 th cluster, number of 10 min intervals with zero pickups: 42\n", "------------------------------------------------------------\n", "For 26 th cluster, number of 10 min intervals with zero pickups: 32\n", "------------------------------------------------------------\n", "For 27 th cluster, number of 10 min intervals with zero pickups: 215\n", "------------------------------------------------------------\n", "For 28 th cluster, number of 10 min intervals with zero pickups: 37\n", "------------------------------------------------------------\n", "For 29 th cluster, number of 10 min intervals with zero pickups: 42\n", "------------------------------------------------------------\n", "For 30 th cluster, number of 10 min intervals with zero pickups: 1181\n", "------------------------------------------------------------\n", "For 31 th cluster, number of 10 min intervals with zero pickups: 43\n", "------------------------------------------------------------\n", "For 32 th cluster, number of 10 min intervals with zero pickups: 45\n", "------------------------------------------------------------\n", "For 33 th cluster, number of 10 min intervals with zero pickups: 44\n", "------------------------------------------------------------\n", "For 34 th cluster, number of 10 min intervals with zero pickups: 40\n", "------------------------------------------------------------\n", "For 35 th cluster, number of 10 min intervals with zero pickups: 43\n", "------------------------------------------------------------\n", "For 36 th cluster, number of 10 min intervals with zero pickups: 37\n", "------------------------------------------------------------\n", "For 37 th cluster, number of 10 min intervals with zero pickups: 322\n", "------------------------------------------------------------\n", "For 38 th cluster, number of 10 min intervals with zero pickups: 37\n", "------------------------------------------------------------\n", "For 39 th cluster, number of 10 min intervals with zero pickups: 44\n", "------------------------------------------------------------\n" ] } ], "source": [ "# for each cluster number of 10-min intervals with 0 pickups\n", "print(\"Below is the list of no. of zero pickups in each cluster of Jan 2015\")\n", "print('*'*70)\n", "print('*'*70)\n", "for i in range(40):\n", " print(\"For\",i,\"th cluster, number of 10 min intervals with zero pickups: \",4464 - len(jan_2015_unique[i]))\n", " print('-'*60)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**We need to fill these missing values with 0's for Jan 2016 data which will be used for building models**" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Fills a value of zero for every bin where no pickup data is present \n", "# the count_values: number of pickps that are happened in each region for each 10min interval\n", "# there wont be any value if there are no pickups.\n", "# values: number of unique bins\n", "\n", "# for every 10min interval(pickup_bin) we will check it is there in our unique bin,\n", "# if it is there we will add the count_values[index] to smoothed data\n", "# if not we add 0 to the smoothed data\n", "# we finally return smoothed data\n", "def fill_missing(count_values,values):\n", " smoothed_regions=[]\n", " ind=0 # ind iterates over count_values only\n", " for r in range(0,40):\n", " smoothed_bins=[]\n", " for i in range(4464):\n", " if i in values[r]:\n", " smoothed_bins.append(count_values[ind])\n", " ind+=1\n", " else:\n", " smoothed_bins.append(0)\n", " smoothed_regions.extend(smoothed_bins)\n", " return smoothed_regions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Preparing the Dataframe \"basline_df\" which will be used for baseline models.
\n", "It will contain all the smoothed values from Jan-2016" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Filling Missing values of Jan-2016 with 0\n", "# Remember: the \"no_of_trips\" represents the number of pickups that happened \n", "jan_2016_smooth = fill_missing(jan_2016_groupby['no_of_trips'].values,jan_2016_unique)" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "number of 10 min intervals among all the clusters 178560\n" ] } ], "source": [ "# number of 10 min indices for jan 2016 = 24*31*60/10 = 4464\n", "# for each cluster we will have 4464 values\n", "# therefore length of the jan_2016_smooth = 40*4464 = 178560\n", "print(\"number of 10 min intervals among all the clusters \",len(jan_2016_smooth))" ] }, { "cell_type": "code", "execution_count": 417, "metadata": { "collapsed": true }, "outputs": [], "source": [ "baseline_df = pd.DataFrame()\n", "baseline_df['Prediction']=jan_2016_smooth" ] }, { "cell_type": "code", "execution_count": 418, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(178560, 1)" ] }, "execution_count": 418, "metadata": {}, "output_type": "execute_result" } ], "source": [ "baseline_df.shape" ] }, { "cell_type": "code", "execution_count": 420, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Prediction
00
163
2217
3189
4137
5135
6129
7150
8164
9152
\n", "
" ], "text/plain": [ " Prediction\n", "0 0\n", "1 63\n", "2 217\n", "3 189\n", "4 137\n", "5 135\n", "6 129\n", "7 150\n", "8 164\n", "9 152" ] }, "execution_count": 420, "metadata": {}, "output_type": "execute_result" } ], "source": [ "baseline_df.head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Modelling: Baseline Models\n", "\n", "Now we get into modelling in order to forecast the pickup densities for the months of Jan 2016 using \n", "* Previous known values of the 2016 data itself to predict the future values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simple Moving Averages\n", "The First Model used is the Moving Averages Model which uses the previous n values in order to predict the next value
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We use the Moving averages of the 2016 values itself to predict the future value using $\\begin{align}P_{t} = ( P_{t-1} + P_{t-2} + P_{t-3} .... P_{t-n} )/n \\end{align}$" ] }, { "cell_type": "code", "execution_count": 423, "metadata": {}, "outputs": [], "source": [ "def MA_Predictions(df, window_size):\n", " # window_size is the hyperparameter\n", " predicted_value=(df['Prediction'].values)[0]\n", " error=[]\n", " predicted_values=[]\n", " for i in range(0,4464*40):\n", " k = i%4464\n", " if k == 0:\n", " predicted_values.append(0)\n", " error.append(0)\n", " predicted_value=(df['Prediction'].values)[i]\n", " continue\n", " predicted_values.append(predicted_value)\n", " error.append(abs(predicted_value-(df['Prediction'].values)[i]))\n", " if k+1>=window_size:\n", " predicted_value=int(sum((df['Prediction'].values)[(i+1)-window_size:(i+1)])/window_size)\n", " else:\n", " predicted_value=int(sum((df['Prediction'].values)[i-k:(i+1)])/k)\n", " \n", " df['MA_Predicted'] = predicted_values\n", " df['MA_Error'] = error\n", " mape_err = (sum(error)/len(error))/(sum(df['Prediction'].values)/len(df['Prediction'].values))\n", " mse_err = sum([e**2 for e in error])/len(error)\n", " return df,mape_err,mse_err" ] }, { "cell_type": "code", "execution_count": 424, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "For win_size = 1 MAPE = 0.142812087087 MSE = 174.436312724\n", "For win_size = 2 MAPE = 0.137720502279 MSE = 170.546051747\n", "For win_size = 3 MAPE = 0.141693704842 MSE = 183.141683468\n", "For win_size = 4 MAPE = 0.149060700302 MSE = 202.277587366\n", "For win_size = 5 MAPE = 0.157430893592 MSE = 224.104469086\n" ] } ], "source": [ "# Hyperparameter Tuning\n", "for i in range(1,6):\n", " _,mean_err,median_err = MA_Predictions(baseline_df, i)\n", " print(\"For win_size =\", i, \"MAPE =\", mean_err, \"MSE =\", median_err)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the above the Hyperparameter window-size, which is tuned manually and it is found that the window-size of 2 is optimal for getting the best results using Simple Moving Averages for previous 2016 values, therefore we get:
$\\begin{align}P_{t} = (P_{t-1}+P_{t-2})/2 \\end{align}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Building the model with the best value of the hyperparameter." ] }, { "cell_type": "code", "execution_count": 425, "metadata": {}, "outputs": [], "source": [ "baseline_df,mean_err,median_err = MA_Predictions(baseline_df, 2)" ] }, { "cell_type": "code", "execution_count": 426, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PredictionMA_PredictedMA_Error
0000
163063
221731186
318914049
413720366
\n", "
" ], "text/plain": [ " Prediction MA_Predicted MA_Error\n", "0 0 0 0\n", "1 63 0 63\n", "2 217 31 186\n", "3 189 140 49\n", "4 137 203 66" ] }, "execution_count": 426, "metadata": {}, "output_type": "execute_result" } ], "source": [ "baseline_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Weighted Moving Averages\n", "The Moving Avergaes Model used gave equal importance to all the values in the window used, but we know intuitively that the future is more likely to be similar to the latest values and less similar to the older values. Weighted Averages converts this analogy into a mathematical relationship giving the highest weight while computing the averages to the latest previous value and decreasing weights to the subsequent older ones
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Weighted Moving Averages using Previous 2016 Values - $\\begin{align}P_{t} = ( N*P_{t-1} + (N-1)*P_{t-2} + (N-2)*P_{t-3} .... 1*P_{t-n} )/(N*(N+1)/2) \\end{align}$" ] }, { "cell_type": "code", "execution_count": 427, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def WA_Predictions(df, window_size):\n", " # window_size is the hyperparameter\n", " predicted_value=(df['Prediction'].values)[0]\n", " error=[]\n", " predicted_values=[]\n", " for i in range(0,4464*40):\n", " k=i%4464\n", " if k==0:\n", " predicted_values.append(0)\n", " error.append(0)\n", " predicted_value=(df['Prediction'].values)[i]\n", " continue\n", " predicted_values.append(predicted_value)\n", " error.append(abs(predicted_value-(df['Prediction'].values)[i]))\n", " if k+1>=window_size:\n", " sum_values=0\n", " sum_of_coeff=0\n", " for j in range(window_size,0,-1):\n", " sum_values += j*(df['Prediction'].values)[i-window_size+j]\n", " sum_of_coeff+=j\n", " predicted_value=int(sum_values/sum_of_coeff)\n", "\n", " else:\n", " sum_values=0\n", " sum_of_coeff=0\n", " for j in range(k,0,-1):\n", " sum_values += j*(df['Prediction'].values)[i-1]\n", " sum_of_coeff+=j\n", " predicted_value=int(sum_values/sum_of_coeff)\n", " \n", " df['WA_Predicted'] = predicted_values\n", " df['WA_Error'] = error\n", " mape_err = (sum(error)/len(error))/(sum(df['Prediction'].values)/len(df['Prediction'].values))\n", " mse_err = sum([e**2 for e in error])/len(error)\n", " return df,mape_err,mse_err" ] }, { "cell_type": "code", "execution_count": 428, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "For win_size = 1 MAPE = 0.142812087087 MSE = 174.436312724\n", "For win_size = 2 MAPE = 0.13542001875 MSE = 162.216829077\n", "For win_size = 3 MAPE = 0.136067200587 MSE = 167.878970654\n", "For win_size = 4 MAPE = 0.139417728635 MSE = 177.41140793\n", "For win_size = 5 MAPE = 0.143881454687 MSE = 188.629037858\n" ] } ], "source": [ "# Hyperparameter tuning\n", "for i in range (1, 6):\n", " _,mean_err,median_err = WA_Predictions(baseline_df, i)\n", " print(\"For win_size =\", i, \"MAPE =\", mean_err, \"MSE =\", median_err)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the above the Hyperparameter window-size, which is tuned manually and it is found that the window-size of 2 is optimal for getting the best results using Weighted Moving Averages for previous 2016 values therefore we get:
$\\begin{align} P_{t} = ( 2*P_{t-1} + P_{t-2} )/3 \\end{align}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Building the model with the best value of the hyperparameter." ] }, { "cell_type": "code", "execution_count": 429, "metadata": { "collapsed": true }, "outputs": [], "source": [ "baseline_df,mean_err,median_err = WA_Predictions(baseline_df, 2)" ] }, { "cell_type": "code", "execution_count": 430, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PredictionMA_PredictedMA_ErrorWA_PredictedWA_Error
000000
163063063
22173118642175
31891404916524
41372036619861
\n", "
" ], "text/plain": [ " Prediction MA_Predicted MA_Error WA_Predicted WA_Error\n", "0 0 0 0 0 0\n", "1 63 0 63 0 63\n", "2 217 31 186 42 175\n", "3 189 140 49 165 24\n", "4 137 203 66 198 61" ] }, "execution_count": 430, "metadata": {}, "output_type": "execute_result" } ], "source": [ "baseline_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exponential Weighted Moving Averages\n", " https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average\n", "Through weighted averaged we have satisfied the analogy of giving higher weights to the latest value and decreasing weights to the subsequent ones but we still do not know which is the correct weighting scheme as there are infinetly many possibilities in which we can assign weights in a non-increasing order and tune the the hyperparameter window-size. To simplify this process we use Exponential Moving Averages which is a more logical way towards assigning weights and at the same time also using an optimal window-size.\n", "\n", "In exponential moving averages we use a single hyperparameter alpha $\\begin{align}(\\alpha)\\end{align}$ which is a value between 0 & 1 and based on the value of the hyperparameter alpha the weights and the window sizes are configured.
\n", "For eg. If $\\begin{align}\\alpha=0.9\\end{align}$ then the number of days on which the value of the current iteration is based is~$\\begin{align}1/(1-\\alpha)=10\\end{align}$ i.e. we consider values 10 days prior before we predict the value for the current iteration. Also the weights are assigned using $\\begin{align}2/(N+1)=0.18\\end{align}$ ,where N = number of prior values being considered, hence from this it is implied that the first or latest value is assigned a weight of 0.18 which keeps exponentially decreasing for the subsequent values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\begin{align}P^{'}_{t} = \\alpha*P_{t-1} + (1-\\alpha)*P^{'}_{t-1} \\end{align}$" ] }, { "cell_type": "code", "execution_count": 434, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def EA_Predictions(df, alpha):\n", " # alpha is the hyperparameter\n", " predicted_value= (df['Prediction'].values)[0]\n", " error=[]\n", " predicted_values=[]\n", " for i in range(0,4464*40):\n", " if i%4464==0:\n", " predicted_values.append(0)\n", " error.append(0)\n", " predicted_value= (df['Prediction'].values)[i]\n", " continue\n", " predicted_values.append(predicted_value)\n", " error.append(abs(predicted_value-(df['Prediction'].values)[i]))\n", " \n", " # Predicted(t) = alpha*Actual(t-1) + (1-alpha)*Predicted(t-1)\n", " predicted_value =int((alpha*((df['Prediction'].values)[i])) + (1-alpha)*predicted_value)\n", " \n", " df['EA_Predicted'] = predicted_values\n", " df['EA_Error'] = error\n", " mape_err = (sum(error)/len(error))/(sum(df['Prediction'].values)/len(df['Prediction'].values))\n", " mse_err = sum([e**2 for e in error])/len(error)\n", " return df,mape_err,mse_err" ] }, { "cell_type": "code", "execution_count": 435, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "For alpha = 0.1 MAPE = 0.26827270693 MSE = 612.432801299\n", "For alpha = 0.2 MAPE = 0.187630230499 MSE = 311.542781138\n", "For alpha = 0.3 MAPE = 0.158402420419 MSE = 223.529452285\n", "For alpha = 0.4 MAPE = 0.144659373664 MSE = 186.786603943\n", "For alpha = 0.5 MAPE = 0.137888471699 MSE = 169.442389113\n", "For alpha = 0.6 MAPE = 0.13542623984 MSE = 162.311027106\n", "For alpha = 0.7 MAPE = 0.135159675576 MSE = 160.319517249\n", "For alpha = 0.8 MAPE = 0.136416524194 MSE = 161.955437948\n", "For alpha = 0.9 MAPE = 0.138936065485 MSE = 166.759963038\n", "For alpha = 1.0 MAPE = 0.142812087087 MSE = 174.436312724\n" ] } ], "source": [ "# Hyperparameter tuning\n", "for i in np.linspace(0.1,1,10):\n", " _,mean_err,median_err = EA_Predictions(baseline_df, i)\n", " print(\"For alpha =\", i, \"MAPE =\", mean_err, \"MSE =\", median_err)" ] }, { "cell_type": "code", "execution_count": 439, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "For alpha = 0.6 MAPE = 0.13542623984 MSE = 162.311027106\n", "For alpha = 0.625 MAPE = 0.135207182078 MSE = 161.42562164\n", "For alpha = 0.65 MAPE = 0.135032991569 MSE = 160.792288306\n", "For alpha = 0.675 MAPE = 0.13513281178 MSE = 160.518850806\n", "For alpha = 0.7 MAPE = 0.13515845021 MSE = 160.317909946\n", "For alpha = 0.725 MAPE = 0.135336788112 MSE = 160.432269265\n", "For alpha = 0.75 MAPE = 0.135573943588 MSE = 160.658187724\n", "For alpha = 0.775 MAPE = 0.135967757412 MSE = 161.276937724\n", "For alpha = 0.8 MAPE = 0.136416524194 MSE = 161.955437948\n" ] } ], "source": [ "# Tuning the value of alpha between 0.6 and 0.8\n", "for i in np.linspace(.6, .8, 9):\n", " _,mean_err,median_err = EA_Predictions(baseline_df, i)\n", " print(\"For alpha =\", i, \"MAPE =\", mean_err, \"MSE =\", median_err)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the above the Hyperparameter window-size, which is tuned manually and it is found that the alpha of 0.65 is optimal for getting the best results using Exponentially Weighted Moving Averages for previous 2016 values therefore we get:
$\\begin{align}P^{'}_{t} = 0.65*P_{t-1} + 0.35*P^{'}_{t-1} \\end{align}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Building the model with the best value of the hyperparameter." ] }, { "cell_type": "code", "execution_count": 440, "metadata": { "collapsed": true }, "outputs": [], "source": [ "baseline_df, mean_err, median_err = EA_Predictions(baseline_df, 0.65)" ] }, { "cell_type": "code", "execution_count": 441, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PredictionMA_PredictedMA_ErrorWA_PredictedWA_ErrorEA_PredictedEA_Error
00000000
163063063063
2217311864217540177
3189140491652415534
4137203661986117740
\n", "
" ], "text/plain": [ " Prediction MA_Predicted MA_Error WA_Predicted WA_Error EA_Predicted \\\n", "0 0 0 0 0 0 0 \n", "1 63 0 63 0 63 0 \n", "2 217 31 186 42 175 40 \n", "3 189 140 49 165 24 155 \n", "4 137 203 66 198 61 177 \n", "\n", " EA_Error \n", "0 0 \n", "1 63 \n", "2 177 \n", "3 34 \n", "4 40 " ] }, "execution_count": 441, "metadata": {}, "output_type": "execute_result" } ], "source": [ "baseline_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparison between baseline models\n", "We have chosen our error metric for comparison between models as MAPE (Mean Absolute Percentage Error) so that we can know that on an average how good is our model with predictions and MSE (Mean Squared Error) is also used so that we have a clearer understanding as to how well our forecasting model performs with outliers so that we make sure that there is not much of a error margin between our prediction and the actual value" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the above values it is inferred that the best forecasting baseline model for our prediction would be:-\n", "$\\begin{align}P^{'}_{t} = \\alpha*P_{t-1} + (1-\\alpha)*P^{'}_{t-1} \\end{align}$ i.e Exponential Moving Averages using 2016 Values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Regression Models\n", "## Data Preparation\n", "For using the regression ML algorithms, we create the following dataset:\n", "\n", "Features:\n", "* Latitude of the cluster\n", "* Longitude of the cluster\n", "* No. of pickups in the last 10-min bin in that cluster\n", "* No. of pickups in the second last 10-min bin in that cluster\n", "* No. of pickups in the third last 10-min bin in that cluster\n", "* No. of pickups in the fourth last 10-min bin in that cluster\n", "* No. of pickups in the fifth last 10-min bin in that cluster\n", "* Exponentially weighted average prediction for the current bin in that cluster\n", "\n", "Output variable\n", "* The number of pickups in the current bin of a given cluster\n", "\n", "Q) Why are we adding the last feature (Exponentially weighted average prediction)?\n", "\n", "Ans) From the baseline models we said the exponential weighted moving avarage gives us the least error, so we will try to add the same exponential weighted moving avarage at timestamp = t as a feature to our data for regression.\n", "\n", "So diagramatically the dataset will look as follows:\n", "\n", "![title](RegressionDataset.png)\n", "\n", "**No. of data points** = 4459\\*40 = 178360
\n", "**No of features** = 8" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train-Test Split\n", "Clearly this is a time series problem and we already have the no. of pickups sorted by time(10-min time bin) for each cluster. So we will do a 70-30 time-based split and not a random split.\n", "\n", "Q) Can we perform the time-based split on the above dataset as it is?
\n", "Ans) **NO**\n", "\n", "**For each of the 40 clusters**, we want the first 70% data points to go to the train set and the last 30% data points in the test set.\n", "\n", "So we need to perform a 70-30 split within each cluster and:\n", "* Concatenate the training set of all the 40 clusters into one training set.\n", "* Concatenate the test set of all the 40 clusters into one test set.\n", "\n", "**Note:** Train - test split is done after preparing the data, but this point is made here for clarity" ] }, { "cell_type": "code", "execution_count": 254, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "178560" ] }, "execution_count": 254, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(jan_2016_smooth)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"jan_2016_smooth\" is a flattened list which contains the no. of pickups for all the 4464 pickup bins in all the 40 clusters, i.e.\n", "* jan_2016_smooth[0:4464] => no. of pickups in all the bins of cluster 0\n", "* jan_2016_smooth[4464:4464*2] => no. of pickups in all the bins of cluster 1\n", "* jan_2016_smooth[4464\\*2:4464\\*3] => no. of pickups in all the bins of cluster 2\n", "* And so on ...\n", "\n", "Now we create a list named \"regions\" which is a list of lists. It contains 40 lists, each of which will contain 4464 values which represent the number of pickups that happened for Jan 2016 in 40 different clusters, i.e.\n", "* regions[0]=> contains 4464 values corresponding to no. of pickups in all the bins of cluster 0\n", "* regions[1]=> contains 4464 values corresponding to no. of pickups in all the bins of cluster 1\n", "* regions[2]=> contains 4464 values corresponding to no. of pickups in all the bins of cluster 2\n", "* And so on ....\n", "* regions[49]=> contains 4464 values corresponding to no. of pickups in all the bins of cluster 39" ] }, { "cell_type": "code", "execution_count": 248, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Making list of all the values of pickup data in every bin for Jan 2016 and storing them region-wise \n", "regions = []\n", "\n", "for i in range(0,40):\n", " regions.append(jan_2016_smooth[4464*i:4464*(i+1)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Creating latitude and Longitude features\n", "latitude will contain 40 lists corresponding to each cluster:\n", "* latitude[0] => contains latitude of cluster 0, repeated 4459 times \n", "* latitude[1] => contains latitude of cluster 1, repeated 4459 times\n", "* And so on ...\n", "* latitude[39] => contains latitude of cluster 39, repeated 4459 times\n", "\n", "Similarly, longitude will contain 40 lists corresponding to each cluster:\n", "* longitude[0] => contains longitude of cluster 0, repeated 4459 times \n", "* longitude[1] => contains longitude of cluster 1, repeated 4459 times\n", "* And so on ...\n", "* longitude[39] => contains longitude of cluster 39, repeated 4459 times" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we decided that for every data point, we'll have the no. of pickups in the **last five** 10-min intervals as features of the data point, so we consider the bins from 5 to 4463 (instead of 0 to 4463), because we will not have 5 previous bins for the first 5 bins." ] }, { "cell_type": "code", "execution_count": 243, "metadata": {}, "outputs": [], "source": [ "latitude = []\n", "longitude = []\n", "for i in range(0,40):\n", " latitude.append([kmeans.cluster_centers_[i][0]]*4459)\n", " longitude.append([kmeans.cluster_centers_[i][1]]*4459)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Creating the features corresponding to the no. of pickups in the last five 10-min bins" ] }, { "cell_type": "code", "execution_count": 366, "metadata": { "collapsed": true }, "outputs": [], "source": [ "previous_5_bin_pickups=[]\n", "for i in range(40):\n", " cluster_i_features=[]\n", " for r in range(4459):\n", " cluster_i_features.append(regions[i][r:r+5])\n", " previous_5_bin_pickups.append(cluster_i_features)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is a another way of writing the code in the above cell using list comprehension" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#for i in range(0,40):\n", " #previous_5_bin_pickups.append([regions[i][r:r+5] for r in range(4459)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note:** previous_5_bin_pickups is a list of list of lists\n", "* Outer list contains 40 values (corresponding to each cluster)\n", "* Each of the Middle list contains 4459 values corresponding to each bin for the given cluster\n", "* Each of the innermost list contain 5 values corresponding to the no. of pickups in the last five 10-min bins for a given bin" ] }, { "cell_type": "code", "execution_count": 367, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "40" ] }, "execution_count": 367, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(previous_5_bin_pickups)" ] }, { "cell_type": "code", "execution_count": 368, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4459" ] }, "execution_count": 368, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(previous_5_bin_pickups[0])" ] }, { "cell_type": "code", "execution_count": 369, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5" ] }, "execution_count": 369, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(previous_5_bin_pickups[0][0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We convert the previous_5_bin_pickups into a numpy array by stacking all the features vertically, one above the other." ] }, { "cell_type": "code", "execution_count": 370, "metadata": { "collapsed": true }, "outputs": [], "source": [ "previous_5_bin_pickups = np.vstack(previous_5_bin_pickups)" ] }, { "cell_type": "code", "execution_count": 371, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(178360, 5)" ] }, "execution_count": 371, "metadata": {}, "output_type": "execute_result" } ], "source": [ "previous_5_bin_pickups.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below code is an example of what we did:" ] }, { "cell_type": "code", "execution_count": 374, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6]], [[11, 12, 13], [12, 13, 14], [13, 14, 15], [14, 15, 16]], [[21, 22, 23], [22, 23, 24], [23, 24, 25], [24, 25, 26]]]\n", "\n", "\n", "\n", "\n", "[[ 1 2 3]\n", " [ 2 3 4]\n", " [ 3 4 5]\n", " [ 4 5 6]\n", " [11 12 13]\n", " [12 13 14]\n", " [13 14 15]\n", " [14 15 16]\n", " [21 22 23]\n", " [22 23 24]\n", " [23 24 25]\n", " [24 25 26]]\n" ] } ], "source": [ "q1 = [[1,2,3],[2,3,4],[3,4,5],[4,5,6]]\n", "q2 = [[11,12,13],[12,13,14],[13,14,15],[14,15,16]]\n", "q3 = [[21,22,23],[22,23,24],[23,24,25],[24,25,26]]\n", "t=[]\n", "t.append(q1)\n", "t.append(q2)\n", "t.append(q3)\n", "print(t)\n", "print(\"\\n\\n\\n\")\n", "\n", "u = np.vstack(t)\n", "print(u)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Creating the output variable (yi)" ] }, { "cell_type": "code", "execution_count": 375, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# output varaible\n", "# it is list of lists\n", "# it will contain number of pickups (4459) for each cluster\n", "output = []\n", "for i in range(0,40):\n", " output.append(regions[i][5:])" ] }, { "cell_type": "code", "execution_count": 376, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "40" ] }, "execution_count": 376, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(output)" ] }, { "cell_type": "code", "execution_count": 377, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4459" ] }, "execution_count": 377, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(output[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Creating feature for exponentially weighted average prediction for each bin in each cluster" ] }, { "cell_type": "code", "execution_count": 378, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# We use the same alpha which we found using hyperparameter searh\n", "alpha=0.35\n", "\n", "# predicted_values is a temporary array that stores exponential weighted moving avarage\n", "# for every 10-min bin in cluster 'i'\n", "predicted_values=[]\n", "\n", "# exp_avg_pred is a list of lists similar to latitude or longitude; it contains values as follows:\n", "# [[x5,x6,x7,....,x4463], cluster 0\n", "# [x5,x6,x7,....,x4463], cluster 1\n", "# [x5,x6,x7,....,x4463], cluster 2\n", "# And so on .....,\n", "# [x5,x6,x7,....,x4463]] cluster 39\n", "exp_avg_pred = []\n", "\n", "# Below code is similar to the code written in the function \"EA_Predictions\" in Baseline models\n", "# With only minor modifications\n", "for r in range(0,40): \n", " for i in range(0,4464):\n", " if i%4464==0:\n", " predicted_value = regions[r][0]\n", " predicted_values.append(0)\n", " continue\n", " predicted_values.append(predicted_value)\n", " predicted_value =int((alpha*predicted_value) + (1-alpha)*(regions[r][i]))\n", " exp_avg_pred.append(predicted_values[5:])\n", " predicted_values=[]" ] }, { "cell_type": "code", "execution_count": 380, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "40" ] }, "execution_count": 380, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(exp_avg_pred)" ] }, { "cell_type": "code", "execution_count": 381, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "4459" ] }, "execution_count": 381, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(exp_avg_pred[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Calculating the size of training and test sets in terms of no. of bins for one cluster" ] }, { "cell_type": "code", "execution_count": 382, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "size of train data : 3121\n", "size of test data : 1338\n" ] } ], "source": [ "print(\"size of train data :\", int(4459*0.7)) # First 70%\n", "print(\"size of test data :\", 4459-int(4459*0.7)) # Last 30%" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This means that, **from every cluster** the first 3121 data points will go to the training set and the last 1338 points will go to the test set.\n", "\n", "Thus:\n", "* Total no. of points in training set = 3121\\*40 = 124840\n", "* Total no. of points in test set = 1338\\*40 = 53520" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Train-Test Split\n", "#### Splitting \"previous_5_bin_pickups\"" ] }, { "cell_type": "code", "execution_count": 383, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# extracting first 3121 timestamp values for our training data from each cluster\n", "train_features = [previous_5_bin_pickups[i*4459:(4459*i+3121)] for i in range(0,40)]\n", "\n", "# extracting last 1338 timestamp values for our test data from each cluster\n", "test_features = [previous_5_bin_pickups[(4459*i)+3121:4459*(i+1)] for i in range(0,40)]" ] }, { "cell_type": "code", "execution_count": 388, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(40, 3121, 5)" ] }, "execution_count": 388, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(train_features), len(train_features[0]), len(train_features[0][0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Thus in the train set, for all the 40 clusters, we have data of the **initial 3121** 10-min bins and eah data point contains **5** features corresponding to the no. of pickups in the last five 10-min bins." ] }, { "cell_type": "code", "execution_count": 389, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(40, 1338, 5)" ] }, "execution_count": 389, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(test_features), len(test_features[0]), len(test_features[0][0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Thus in the test set, for all the 40 clusters, we have data of the **last 1338** 10-min bins and eah data point contains **5** features corresponding to the no. of pickups in the last five 10-min bins." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Splitting \"latitude\", \"longitude\", \"output\" and \"exp_avg_pred\"\n", "Extracting first 70% points to create the training set" ] }, { "cell_type": "code", "execution_count": 390, "metadata": {}, "outputs": [], "source": [ "# extracting first 3321 timestamp values i.e 70% of 4459 (total timestamps) for our training data\n", "train_lat = [i[:3121] for i in latitude]\n", "train_lon = [i[:3121] for i in longitude]\n", "train_output = [i[:3121] for i in output]\n", "train_exp_avg = [i[:3121] for i in exp_avg_pred]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Extracting last 30% points to create the test set" ] }, { "cell_type": "code", "execution_count": 391, "metadata": {}, "outputs": [], "source": [ "# extracting the rest of the timestamp values i.e 30% of 4459 (total timestamps) for our test data\n", "test_lat = [i[3121:] for i in latitude]\n", "test_lon = [i[3121:] for i in longitude]\n", "test_output = [i[3121:] for i in output]\n", "test_exp_avg = [i[3121:] for i in exp_avg_pred]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Important:\n", "* We have all the training and test set features in the form of list of lists; where data corresponding to each cluster is present in a separate list.\n", "* So we flatten all these nested lists into one single list - This is equivalent to concatenting the data for all the clusters into a single list.\n", "* This is done both for the train set as well as for the test set ." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Flattening the \"train_features\" and \"test_features\" lists**" ] }, { "cell_type": "code", "execution_count": 392, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# the above contains values in the form of list of lists (i.e. list of values of each region),\n", "# here we make all of them in one list\n", "train_flat_features = []\n", "for i in range(0,40):\n", " train_flat_features.extend(train_features[i])\n", "test_flat_features = []\n", "for i in range(0,40):\n", " test_flat_features.extend(test_features[i])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Flattening the latitude, longitude, output and exp_avg_pred features of the train set**" ] }, { "cell_type": "code", "execution_count": 393, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# converting lists of lists into sinle list i.e flatten\n", "# a = [[1,2,3,4],[4,6,7,8]]\n", "# print(sum(a,[]))\n", "# [1, 2, 3, 4, 4, 6, 7, 8]\n", "# Explanation => https://stackoverflow.com/a/33542010/8528893\n", "\n", "train_flat_lat = sum(train_lat, [])\n", "train_flat_lon = sum(train_lon, [])\n", "train_flat_output = sum(train_output, [])\n", "train_flat_exp_avg = sum(train_exp_avg,[])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Flattening the latitude, longitude, output and exp_avg_pred features of the test set**" ] }, { "cell_type": "code", "execution_count": 394, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# converting lists of lists into sinle list i.e flatten\n", "# a = [[1,2,3,4],[4,6,7,8]]\n", "# print(sum(a,[]))\n", "# [1, 2, 3, 4, 4, 6, 7, 8]\n", "# Explanation => https://stackoverflow.com/a/33542010/8528893\n", "\n", "test_flat_lat = sum(test_lat, [])\n", "test_flat_lon = sum(test_lon, [])\n", "test_flat_output = sum(test_output, [])\n", "test_flat_exp_avg = sum(test_exp_avg,[])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Finally preparing a dataframe which contains all the features\n", "**Train Dataframe**" ] }, { "cell_type": "code", "execution_count": 395, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(124840, 8)\n" ] } ], "source": [ "# Preparing the data frame for our train data\n", "columns = ['ft_5','ft_4','ft_3','ft_2','ft_1']\n", "df_train = pd.DataFrame(data=train_flat_features, columns=columns) \n", "df_train['lat'] = train_flat_lat\n", "df_train['lon'] = train_flat_lon\n", "df_train['exp_avg'] = train_flat_exp_avg\n", "\n", "print(df_train.shape)" ] }, { "cell_type": "code", "execution_count": 397, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ft_5ft_4ft_3ft_2ft_1latlonexp_avg
006321718913740.776228-73.982119151
16321718913713540.776228-73.982119140
221718913713512940.776228-73.982119132
318913713512915040.776228-73.982119143
413713512915016440.776228-73.982119156
\n", "
" ], "text/plain": [ " ft_5 ft_4 ft_3 ft_2 ft_1 lat lon exp_avg\n", "0 0 63 217 189 137 40.776228 -73.982119 151\n", "1 63 217 189 137 135 40.776228 -73.982119 140\n", "2 217 189 137 135 129 40.776228 -73.982119 132\n", "3 189 137 135 129 150 40.776228 -73.982119 143\n", "4 137 135 129 150 164 40.776228 -73.982119 156" ] }, "execution_count": 397, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_train.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Test Dataframe**" ] }, { "cell_type": "code", "execution_count": 396, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(53520, 8)\n" ] } ], "source": [ "# Preparing the data frame for our train data\n", "df_test = pd.DataFrame(data=test_flat_features, columns=columns) \n", "df_test['lat'] = test_flat_lat\n", "df_test['lon'] = test_flat_lon\n", "df_test['exp_avg'] = test_flat_exp_avg\n", "print(df_test.shape)" ] }, { "cell_type": "code", "execution_count": 398, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ft_5ft_4ft_3ft_2ft_1latlonexp_avg
011912112611510440.776228-73.982119108
112112611510414240.776228-73.982119130
212611510414214140.776228-73.982119137
311510414214113540.776228-73.982119135
410414214113514640.776228-73.982119142
\n", "
" ], "text/plain": [ " ft_5 ft_4 ft_3 ft_2 ft_1 lat lon exp_avg\n", "0 119 121 126 115 104 40.776228 -73.982119 108\n", "1 121 126 115 104 142 40.776228 -73.982119 130\n", "2 126 115 104 142 141 40.776228 -73.982119 137\n", "3 115 104 142 141 135 40.776228 -73.982119 135\n", "4 104 142 141 135 146 40.776228 -73.982119 142" ] }, "execution_count": 398, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_test.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using Linear Regression" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Training the model" ] }, { "cell_type": "code", "execution_count": 399, "metadata": { "collapsed": true }, "outputs": [], "source": [ "lr_reg=LinearRegression().fit(df_train, tsne_train_output)\n", "\n", "y_pred = lr_reg.predict(df_test)\n", "lr_test_predictions = [round(value) for value in y_pred]\n", "y_pred = lr_reg.predict(df_train)\n", "lr_train_predictions = [round(value) for value in y_pred]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Train and Test set predictions" ] }, { "cell_type": "code", "execution_count": 403, "metadata": { "collapsed": true }, "outputs": [], "source": [ "y_pred = lr_reg.predict(df_test)\n", "lr_test_predictions = [round(value) for value in y_pred]\n", "y_pred = lr_reg.predict(df_train)\n", "lr_train_predictions = [round(value) for value in y_pred]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Evaluation" ] }, { "cell_type": "code", "execution_count": 401, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train MAPE = 0.13509442684\n", "Test MAPE = 0.134702921655\n" ] } ], "source": [ "mae_train = mean_absolute_error(train_flat_output, lr_train_predictions)\n", "avg_train_output = sum(train_flat_output)/len(train_flat_output)\n", "train_mape = mae_train/avg_train_output\n", "print(\"Train MAPE = \", train_mape)\n", "\n", "mae_test = mean_absolute_error(test_flat_output, lr_test_predictions)\n", "avg_test_output = sum(test_flat_output)/len(test_flat_output)\n", "test_mape = mae_test/avg_test_output\n", "print(\"Test MAPE = \", test_mape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using Random Forest Regressor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Training the model" ] }, { "cell_type": "code", "execution_count": 402, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,\n", " max_features='sqrt', max_leaf_nodes=None,\n", " min_impurity_decrease=0.0, min_impurity_split=None,\n", " min_samples_leaf=4, min_samples_split=3,\n", " min_weight_fraction_leaf=0.0, n_estimators=40, n_jobs=-1,\n", " oob_score=False, random_state=None, verbose=0, warm_start=False)" ] }, "execution_count": 402, "metadata": {}, "output_type": "execute_result" } ], "source": [ "regr1 = RandomForestRegressor(max_features='sqrt',min_samples_leaf=4,min_samples_split=3,n_estimators=40, n_jobs=-1)\n", "regr1.fit(df_train, tsne_train_output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Train and Test set predictions" ] }, { "cell_type": "code", "execution_count": 405, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Predicting on test data using our trained random forest model \n", "\n", "# the models regr1 is already hyper parameter tuned\n", "# the parameters that we got above are found using grid search\n", "\n", "y_pred = regr1.predict(df_test)\n", "rndf_test_predictions = [round(value) for value in y_pred]\n", "y_pred = regr1.predict(df_train)\n", "rndf_train_predictions = [round(value) for value in y_pred]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Evaluating the model" ] }, { "cell_type": "code", "execution_count": 407, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train MAPE = 0.0968021019377\n", "Test MAPE = 0.133627690757\n" ] } ], "source": [ "mae_train = mean_absolute_error(train_flat_output,rndf_train_predictions)\n", "avg_train_output = sum(train_flat_output)/len(train_flat_output)\n", "train_mape = mae_train/avg_train_output\n", "print(\"Train MAPE = \", train_mape)\n", "\n", "mae_test = mean_absolute_error(test_flat_output, rndf_test_predictions)\n", "avg_test_output = (sum(test_flat_output)/len(test_flat_output))\n", "test_mape = mae_test/avg_test_output\n", "print(\"Test MAPE = \", test_mape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Feature Importance" ] }, { "cell_type": "code", "execution_count": 408, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Index(['ft_5', 'ft_4', 'ft_3', 'ft_2', 'ft_1', 'lat', 'lon', 'exp_avg'], dtype='object')\n", "[ 0.07366849 0.11986027 0.18583002 0.16131902 0.15860672 0.00680606\n", " 0.00327033 0.29063909]\n" ] } ], "source": [ "#feature importances based on analysis using random forest\n", "print (df_train.columns)\n", "print (regr1.feature_importances_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Features in order of importance:\n", "* exp_avg\n", "* ft_3\n", "* ft_2\n", "* ft_1\n", "* ft_4\n", "* ft_5\n", "* lat\n", "* lon" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Using XgBoost Regressor" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Training the model" ] }, { "cell_type": "code", "execution_count": 409, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,\n", " colsample_bytree=0.8, gamma=0, learning_rate=0.1, max_delta_step=0,\n", " max_depth=3, min_child_weight=3, missing=None, n_estimators=1000,\n", " n_jobs=1, nthread=4, objective='reg:linear', random_state=0,\n", " reg_alpha=200, reg_lambda=200, scale_pos_weight=1, seed=None,\n", " silent=True, subsample=0.8)" ] }, "execution_count": 409, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x_model = xgb.XGBRegressor(\n", " learning_rate =0.1,\n", " n_estimators=1000,\n", " max_depth=3,\n", " min_child_weight=3,\n", " gamma=0,\n", " subsample=0.8,\n", " reg_alpha=200, reg_lambda=200,\n", " colsample_bytree=0.8,nthread=4)\n", "x_model.fit(df_train, tsne_train_output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Train and Test Set Predictions" ] }, { "cell_type": "code", "execution_count": 410, "metadata": { "collapsed": true }, "outputs": [], "source": [ "#predicting with our trained Xg-Boost regressor\n", "# the models x_model is already hyper parameter tuned\n", "# the parameters that we got above are found using grid search\n", "\n", "y_pred = x_model.predict(df_test)\n", "xgb_test_predictions = [round(value) for value in y_pred]\n", "y_pred = x_model.predict(df_train)\n", "xgb_train_predictions = [round(value) for value in y_pred]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Evaluating the model" ] }, { "cell_type": "code", "execution_count": 411, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train MAPE = 0.129657928719\n", "Test MAPE = 0.13344030325\n" ] } ], "source": [ "mae_train = mean_absolute_error(train_flat_output, xgb_train_predictions)\n", "avg_train_output = sum(train_flat_output)/len(train_flat_output)\n", "train_mape = mae_train/avg_train_output\n", "print(\"Train MAPE = \", train_mape)\n", "\n", "mae_test = mean_absolute_error(test_flat_output, xgb_test_predictions)\n", "avg_test_output = sum(test_flat_output)/len(test_flat_output)\n", "test_mape = mae_test/avg_test_output\n", "print(\"Test MAPE = \", test_mape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Feature Importance" ] }, { "cell_type": "code", "execution_count": 414, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 414, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZ8AAAEWCAYAAAC5XZqEAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8FfX1//HXgYgmIFEKUgICIojsYSlLXYhSEURRxAVK\nv2pbi9WitkUoLsWlWrFKxYW64NK6FCyCxYVqEbnAT0EKEhFBxEqURUWQCAGqBM7vj5nEAAkJS+bm\n3vt+Ph73wZ3PfGbuOZmQk89nJjPm7oiIiESpWrwDEBGR1KPiIyIikVPxERGRyKn4iIhI5FR8REQk\ncio+IiISORUfkSrGzB42s9/HOw6RymT6Ox9JFmaWB9QHdpZoPsHd1x3EPnOAZ9y90cFFl5jM7K/A\nGne/Kd6xSHLRyEeSzTnuXqvE64ALz6FgZmnx/PyDYWbV4x2DJC8VH0kJZtbdzN4ys3wzezcc0RSt\n+6mZLTezLWb2sZldEbbXBP4FZJlZQfjKMrO/mtntJbbPMbM1JZbzzOx3ZrYE2GpmaeF2U8zsSzNb\nZWbX7CPW4v0X7dvMRprZejP7zMzOM7OzzOxDM/vKzG4ose0tZva8mT0X5vOOmXUosb6VmcXCr8P7\nZtZ/j899yMymm9lW4OfAEGBkmPtLYb9RZvbfcP/LzGxAiX1cZmb/z8zuMbNNYa59S6yvY2ZPmtm6\ncP0/S6w728xyw9jeMrP2FT7AknBUfCTpmVlD4BXgdqAOcB0wxczqhV3WA2cDtYGfAveaWSd33wr0\nBdYdwEhqMNAPOArYBbwEvAs0BHoBvzazMyu4r+8DR4TbjgYmAD8BOgOnAKPNrFmJ/ucCk8Nc/w78\n08wOM7PDwjj+DRwDXA08a2YtS2z7Y+AO4EjgKeBZ4E9h7ueEff4bfm4mcCvwjJk1KLGPbsAKoC7w\nJ+BxM7Nw3dNABtAmjOFeADPrBDwBXAF8D3gEeNHMDq/g10gSjIqPJJt/hr8555f4rfonwHR3n+7u\nu9x9BrAQOAvA3V9x9/96YDbBD+dTDjKO+919tbtvB34A1HP329z9W3f/mKCADKrgvnYAd7j7DmAS\nwQ/1+9x9i7u/D7wPlBwlLHL358P+fyYoXN3DVy1gTBjHG8DLBIWyyDR3fzP8Ov2vtGDcfbK7rwv7\nPAesBLqW6PKJu09w953A34AGQP2wQPUFfunum9x9R/j1BvgF8Ii7v+3uO939b8A3YcyShBJ2Plqk\nDOe5++t7tDUBLjSzc0q0HQbMAginhW4GTiD4hSwDeO8g41i9x+dnmVl+ibbqwNwK7mtj+IMcYHv4\n7xcl1m8nKCp7fba77wqnBLOK1rn7rhJ9PyEYUZUWd6nM7BLgt0DTsKkWQUEs8nmJz98WDnpqEYzE\nvnL3TaXstglwqZldXaKtRom4Jcmo+EgqWA087e6/2HNFOK0zBbiE4Lf+HeGIqWiaqLTLQbcSFKgi\n3y+lT8ntVgOr3L3FgQR/AI4temNm1YBGQNF04bFmVq1EAWoMfFhi2z3z3W3ZzJoQjNp6AfPcfaeZ\n5fLd12tfVgN1zOwod88vZd0d7n5HBfYjSUDTbpIKngHOMbMzzay6mR0RnshvRPDb9eHAl0BhOArq\nXWLbL4DvmVlmibZc4Kzw5Pn3gV+X8/kLgM3hRQjpYQxtzewHhyzD3XU2s/PDK+1+TTB9NR94m6Bw\njgzPAeUA5xBM5ZXlC6Dk+aSaBAXpSwgu1gDaViQod/+M4AKOv5jZ0WEMp4arJwC/NLNuFqhpZv3M\n7MgK5iwJRsVHkp67ryY4CX8DwQ/N1cAIoJq7bwGuAf4BbCI44f5iiW0/ACYCH4fnkbIITpq/C+QR\nnB96rpzP30nwQz4bWAVsAB4jOGFfGaYBFxPk83/A+eH5lW+B/gTnXTYAfwEuCXMsy+NA66JzaO6+\nDBgLzCMoTO2AN/cjtv8jOIf1AcGFHr8GcPeFBOd9Hgzj/gi4bD/2KwlGf2QqkkTM7Bagubv/JN6x\niOyLRj4iIhI5FR8REYmcpt1ERCRyGvmIiEjk9Hc+ZTjqqKO8efPm8Q6j0m3dupWaNWvGO4xKlwp5\npkKOoDyrukWLFm1w93rl9VPxKUP9+vVZuHBhvMOodLFYjJycnHiHUelSIc9UyBGUZ1VnZp9UpJ+m\n3UREJHIqPiIiEjkVHxERiZyKj4iIRE7FR0REIqfiIyIikVPxERGRyKn4iIhI5FR8REQkcio+IiIS\nORUfERGJnIqPiIhETsVHREQip+IjIiKRU/EREZHIqfiIiEjkVHxERCRyKj4iIhI5FR8RkST1s5/9\njGOOOYa2bdsWt3311VecccYZtGjRgjPOOINNmzYBwWO7MzMzyc7OJjs7m9tuuw2A1atXc9ppp9Gq\nVSvatGnDfffdd0hiU/EREUlSl112Ga+++upubWPGjKFXr16sXLmSXr16MWbMmOJ1p5xyCrm5ueTm\n5jJ69GgA0tLSGDt2LMuXL2f+/PmMHz+eZcuWHXRsaQe9hzgws2uAK4HawIXu/tY++l4G3A2sDZse\ndPfHyvuM7Tt20nTUK4cg2qpteLtCLlOeSSEVcgTlWRF5Y/oBcOqpp5KXl7fbumnTphGLxQC49NJL\nycnJ4a677ipzXw0aNKBBgwYAHHnkkbRq1Yq1a9fSunXrA4qtSKKOfK4CzgImAD+sQP/n3D07fJVb\neEREktUXX3xRXEwaNGjA+vXri9fNmzePDh060LdvX95///29ts3Ly2Px4sV069btoONIuJGPmT0M\nNAOWADWADWb2E+Bqd58b1+BERBJUp06d+OSTT6hVqxbTp0/nvPPOY+XKlcXrCwoKGDhwIOPGjaN2\n7doH/XkJV3zc/Zdm1gfoAgwDCtz9nnI2G2hmpwIfAr9x99WldTKzocBQgLp16zG6XeEhjLxqqp8e\nDO+TXSrkmQo5gvKsiKJpNYDPP/+crVu3FrfVrl2bKVOm8L3vfY+NGzdy5JFH7tYfICMjgy1btjBt\n2jQyMzMpLCzk+uuvp1u3btSpU2ev/gci4YrPAXgJmOju35jZL4G/AaeX1tHdHwUeBWjcrLmPfS/5\nvzzD2xWiPJNDKuQIyrMi8obkfPc+L4+aNWuSkxO0XXzxxaxcuZKBAwcyZswYBg0aRE5ODp9//jn1\n69fHzFiwYAE1atSgf//+QHBu6KSTTmLcuHEHm9Z33D3hXkAeUBe4BbhuP7arDnxdkb4nnHCCp4JZ\ns2bFO4RIpEKeqZCju/LcH4MGDfLvf//7npaW5g0bNvTHHnvMN2zY4Keffro3b97cTz/9dN+4caO7\nuz/wwAPeunVrb9++vXfr1s3ffPNNd3efO3euA96uXTvv0KGDd+jQwV955ZUyPxNY6BX4GZvovz5s\nIbjirUxm1sDdPwsX+wPLKz0qEZEqYOLEiaW2z5w5c6+2YcOGMWzYsL3aTz755KJf3g+pRL3archL\nwAAzyzWzU8roc42ZvW9m7wLXAJdFFp2IiJQqIUc+7t40fLsBaF9O3+uB6ys7JhERqbhEH/mIiEgC\nSsiRT2nM7Ebgwj2aJ7v7HfGIR0REypY0xScsMio0IiIJQNNuIiISORUfERGJnIqPiIhETsVHREQi\np+IjIiKRU/EREZHIqfiIiEjkVHxERCRyKj4iIhI5FR8REYmcio+IiEROxUdEJI7uu+8+2rZtS5s2\nbYofU33xxRdz+eWXk52dTdOmTcnOzgZgxowZdO7cmXbt2tG5c2feeOONeIZ+UKwynlBX2czsGuBK\ngqeYXujub+2j76nAOILn/gxy9+cr8hmNmzX3ahfddyjCrdIO5jnxiSQV8kyFHCF58swb04+lS5cy\naNAgFixYQI0aNejTpw8PPfQQLVq0IBaLkZOTw/Dhw8nMzGT06NEsXryY+vXrk5WVxdKlSznzzDNZ\nu3ZtvFPZjZktcvcu5fVL1JHPVcBZwATgh+X0/ZTg6aV/r+SYRET2y/Lly+nevTsZGRmkpaXRs2dP\nXnjhheL17s4//vEPBg8eDEDHjh3JysoCoE2bNvzvf//jm2++iUvsByvhio+ZPQw0A5YQPKH0N/t6\njLa757n7EmBXhGGKiJSrbdu2zJkzh40bN7Jt2zamT5/O6tWri9fPnTuX+vXr06JFi722nTJlCh07\nduTwww+PMuRDJuHGru7+SzPrA3QBhgEF7n7Podi3mQ0FhgLUrVuP0e0KD8Vuq7T66cE0RrJLhTxT\nIUdInjxjsRgA5557Lj169CA9PZ0mTZrw+eefE4vFKCgoYMKECXTt2rW4b5FVq1Zx00038ac//Wmv\ndYki4YpPZXL3R4FHITjnkwzzyuVJlvnz8qRCnqmQIyRPnnlDcgDIycnh7rvvBuCGG26gUaNG5OTk\nMHPmTObPn8+iRYto1KhR8XZr1qxh6NCh/OMf/+Ckk06KR+iHROIfwUqSflh1VozpF+8wKl0sFiv+\nT5DMUiHPVMgRki/P9evXc8wxx/Dpp58ydepU5s2bB8CiRYs48cQTdys8+fn59OvXjzvvvDOhCw8k\n4DmfPWwBjox3ECIiB2rgwIG0bt2ac845h/Hjx3P00UcD8MYbbxRfaFDkwQcf5KOPPuIPf/gD2dnZ\nZGdns379+niEfdASfeTzEvC8mZ0LXO3uc/fsYGY/AF4AjgbOMbNb3b1NxHGKiJRq7ty9fmwBMGrU\nKHJycnZru+mmm7jpppsiiKryJWTxcfem4dsNBH+/s6++/wEa7auPiIhEK9Gn3UREJAEl5MinNGZ2\nI3DhHs2T3f2OeMQjIiJlS5riExYZFRoRkQSgaTcREYmcio+IiEROxUdERCKn4iMiIpFT8RERkcip\n+IiISORUfEREJHIqPiIiEjkVHxERiZyKj4iIRE7FR0REIqfiIyJyiNx33320bduWNm3aMG7cOABG\njBjBiSeeSPv27RkwYAD5+fkAPPvss8UPhMvOzqZatWrk5ubGM/xImbvHO4YqqXGz5l7tovviHUal\nG96ukLHvJc39ZcuUCnmmQo5QNfPMG9OPpUuXMmjQIBYsWECNGjXo06cPDz30EKtWreL0008nLS2N\n3/3udwDcddddu23/3nvvce655/Lxxx8Xt8Visb0eJpcIzGyRu3cpr19CjnzM7BozW25ma83sh+X0\n/a2ZLTOzJWY208yaRBWniKSO5cuX0717dzIyMkhLS6Nnz5688MIL9O7dm7S0oFh2796dNWvW7LXt\nxIkT93pkdrJLyOIDXAWcBUwA9ll8gMVAF3dvDzwP/KmSYxORFNS2bVvmzJnDxo0b2bZtG9OnT2f1\n6tW79XniiSfo27fvXts+99xzKVd8qtbYtQLM7GGgGbAEqAFsMLOfAFe7+14PQ3f3WSUW5wM/2ce+\nhwJDAerWrcfodoWHMvQqqX56MI2R7FIhz1TIEapmnrFYDIBzzz2XHj16kJ6eTpMmTfj888+L1z3z\nzDPk5+fTsGHD4jaAZcuW4e5s2LBht/aCgoLdlpNNQp7zMbM8oAswDChw93squN2DwOfufnt5fXXO\nJ7mkQp6pkCNUzTzzxvTbq+2GG26gUaNGXHXVVfztb3/j4YcfZubMmWRkZOzW7ze/+Q316tXjhhtu\n2K092c/5VK0jWInC0VEXoGdF+qcfVp0VpXxDJZtYLEbekJx4h1HpUiHPVMgRqnae69ev55hjjuHT\nTz9l6tSpzJs3j1dffZW77rqL2bNn71V4du3axeTJk5kzZ06cIo6flCg+ZvYj4Eagp7t/E+94RCQ5\nDRw4kI0bN3LYYYcxfvx4jj76aIYNG8Y333zDGWecAQQXHTz88MMAzJkzh0aNGtGsWbN4hh0XiV58\ntgC199XBzDoCjwB93H19JFGJSEqaO3ev08589NFHZfbPyclh/vz5lRlSlZWoV7sVeQkYYGa5ZnZK\nGX3uBmoBk8N+L0YXnoiIlCYhRz7u3jR8uwFoX07fH1V6QCIisl8SfeQjIiIJKCFHPqUxsxuBC/do\nnuzud8QjHhERKVvSFJ+wyKjQiIgkAE27iYhI5FR8REQkcio+IiISORUfERGJnIqPiIhETsVHREQi\np+IjIiKR2+/iY2ZHm9k+b2kjIiKyLxUqPmYWM7PaZlYHeBd40sz+XLmhiYhIsqroyCfT3TcD5wNP\nuntnQDfsFBGRA1LR4pNmZg2Ai4CXKzEeEZEq4d5776VNmza0bduWwYMH87///Y+ZM2fSqVMnsrOz\nOfnkk4uf1fPnP/+Z1q1b0759e3r16sUnn3wS5+irvooWn9uA14D/uvt/zKwZsLLywhIRiZ+1a9dy\n//33s3DhQpYuXcrOnTuZNGkSV155Jc8++yy5ubn8+Mc/5vbbbwegY8eOLFy4kCVLlnDBBRcwcuTI\nOGdQ9VXoxqLuPhmYXGL5Y2BgZQVVHjO7BriS4CmmF7r7W/vo+0vgV8BOoAAY6u7LyvuM7Tt20nTU\nK4co4qpreLtCLlOeSSEVcoTKzzNvTD8ACgsL2b59O4cddhjbtm0jKysLM2Pz5s0AfP3112RlZQFw\n2mmnFW/fvXt3nnnmmUqLL1lUqPiY2QnAQ0B9d28bXu3W391vr9ToynYV0Be4FPghUGbxAf7u7g8D\nmFl/4M9An0qPUEQSVsOGDbnuuuto3Lgx6enp9O7dm969e/PYY49x1llnkZ6eTu3atUt9BPbjjz9O\n37594xB1YjF3L7+T2WxgBPCIu3cM25a6e9tKjq+0WB4GfgZ8A9QgeJrpl8DV7r73A9R333YwcIm7\nl/qdYWZDgaEAdevW6zx63IRDGXqVVD8dvtge7ygqXyrkmQo5QuXn2a5hJlu2bOHmm29m9OjR1KpV\ni1tuuYWePXsyd+5cBg0aROvWrZk0aRKrV69mxIgRxdvOmDGDF154gXHjxlGjRo2DiqOgoIBatWod\nbDqRO+200xa5e5fy+lX0eT4Z7r7AzEq2FR5QZAfJ3X9pZn2ALsAwoMDd79nXNmb2K+C3BMXq9H3s\n+1HgUYDGzZr72PeS5nFHZRrerhDlmRxSIUeo/DzzhuQwefJkOnbsyHnnnQfAunXrmDdvHmvXruWq\nq64CoFmzZvTp04ecnBwAXn/9daZOncrs2bM55phjDjqOWCxWvO9kVNEjuMHMjgccwMwuAD6rtKgO\nMXcfD4w3sx8DNxFM1+1T+mHVWRHO/SazWCxG3pCceIdR6VIhz1TIEaLJs3HjxsyfP59t27aRnp7O\nzJkz6dKlC5MnT+bDDz/khBNOYMaMGbRq1QqAxYsXc8UVV/Dqq68eksKTCipafH5FMCI40czWAquA\nIZUWVeWZRHDuSkSkTN26deOCCy6gU6dOpKWl0bFjR4YOHUqjRo0YOHAg1apV4+ijj+aJJ54AYMSI\nERQUFHDhhRcCQfF68cUX45lClVdu8TGzakAXd/+RmdUEqrn7lsoPrUK2EFzxViYza+HuRZeF90OX\niItIBdx6663ceuutu7UNGDCAAQMG7NX39ddfjyqspFHu3/m4+y6Ccyu4+9YqVHgAXgIGmFmumZ1S\nRp9hZva+meUSnPcpd8pNREQqV0Wn3WaY2XXAc8DWokZ3/6pSoiqHuzcN324A9nmTU3e/ttIDEhGR\n/VLR4vOz8N9flWhzoNmhDUdERFJBRe9wcFxlB3KwzOxG4MI9mie7+x3xiEdERMpW0TscXFJau7s/\ndWjDOXBhkVGhERFJABWddvtBifdHAL2Ad4AqU3xERCRxVHTa7eqSy2aWCTxdKRGJiEjS2+/HaIe2\nAS0OZSAiIpI6KnrO5yXCW+sQFKzWlHjEgoiIyP6o6DmfkjfuLAQ+cfc1lRCPiIikgIpOu53l7rPD\n15vuvsbM7qrUyEREJGlVtPicUUqbnpYkIiIHZJ/TbmZ2JcFTQ5uZ2ZISq44E3qzMwEREJHmVd87n\n78C/gDuBUSXat8Trvm4iIpL49ll83P1r4GtgMICZHUPwR6a1zKyWu39a+SGKiEiyqdA5HzM7x8xW\nEjxEbjaQRzAiEhGJxIoVK8jOzubyyy8nOzub2rVrM27cOHJzc+nevTvZ2dl06dKFBQsWAPD1119z\nzjnn0KFDB9q0acOTTz4Z5wykpIpecHA70B34MLzJaC90zkdEItSyZUtyc3N57LHHWLRoERkZGQwY\nMICRI0dy8803k5uby2233cbIkSMBGD9+PK1bt+bdd98lFosxfPhwvv322zhnIUUq+nc+O9x9o5lV\nM7Nq7j4r2S+13r5jJ01HvRLvMCrd8HaFXKY8k0Iy55g3pt9uyzNnzuT444+nSZMmmBmbN28GgtFO\nVlYWAGbGli1bcHcKCgqoU6cOaWkV/ZEnla2iRyLfzGoBc4FnzWw9wR+b7pOZ/QS4BqgBvA38EXgd\n6AF8RTCF9wfgQ+DVsE/HcPkSd99Wxn5HA+cA6cBbwBXAicDf3L1r2Kcp8KK7tzezs4A/Ezx87h2g\nmbufXcHcRaSKmTRpEoMHDwZg3LhxnHnmmVx33XXs2rWLt956C4Bhw4bRv39/srKy2LJlC8899xzV\nqh3oHcXkUDN3L7+TWU1gO8E03RAgE3jW3TfuY5tWwJ+A8919h5n9BZhPUIj6EBSa5u5+RVgoVgEn\nu/ubZvYEsMzd7ylj33WKrrYzs6eBf7j7S+Gjss9394/N7HfAYQR3Z1gJnOruq8xsInBkacXHzIYC\nQwHq1q3XefS4CeV+bRJd/XT4Ynu8o6h8qZBnMufYrmFm8ftNmzZx2WWX8eSTT1KnTh3uv/9+OnTo\nQM+ePZk1axYvv/wyY8eOZfbs2SxdupSrrrqKdevWcd111/HYY49Rs2bNOGZScQUFBdSqVSveYey3\n0047bZG7dymvX4WKD4CZNQFauPvrZpYBVHf3LfvoPwy4AVgfNqUDE939FjN7DWgOZLv7lrD4zHH3\nxuG2pwPXuPt5Zex7IDASyADqAA+4+xgzuwHYFb5/B7gYqAnc5+49w237A0PLG/k0btbcq110XwW+\nMolteLtCxr6X/FMRqZBnMudYctrt9ttvZ86cOfz73/8GIDMzk/z8fMwMdyczM5PNmzfTr18/Ro0a\nxSmnnALA6aefzpgxY+jatWtccthfsViMnJyceIex38ysQsWnole7/QJ4HngkbGoI/LO8zQimwbLD\nV8uw8GQAjcI+Jcv6nlWw1KpoZkcAfwEucPd2wASCy78BngMuMrMTAHf3lWEcIpIk3njjjeIpN4Cs\nrCxmz55dvK5Fi+CG+40bN2bmzJkAfPHFF6xYsYJmzZpFH7CUzt3LfQG5BNNli0u0vVfONq0JpruO\nCZfrAE2ABwhGREOAl8N1TQmKTY9weQIwvIz9HgV8QTCSqgUsBW4psf4/BM8aGhkupwOrgabh8rNF\nn7uv1wknnOCpYNasWfEOIRKpkGcq5Lh161avXbu25+fnF7fNnTvXO3Xq5O3bt/euXbv6woUL3d19\n7dq1fsYZZ3jbtm29TZs2/vTTT8cr7AOSqMcTWOgVqCsVHaN/4+7fmgWDCDNLo4yRSYmitszMbgL+\nbWbVgB3AbwmeinqSu+80s4Fm9lNgFrAcuNTMHgmL1kNl7DffzCYA7xH8vdF/9ujyHHA3cFzYf7uZ\nXQW8amYbgAUVzFlEqpiMjAymTZtGZuZ354BOPvlkFi1atFffrKys4qk5qXoqWnxmh+dT0s3sDIL7\nvb1U3kbu/hxBMSipe4n150PxlWm73P2XFQnG3W8Cbipj3T3s/ggIgFnufqIF1XM8sLAinyMiIpWj\notcdjgK+JBhtXAFMp4wf/lXUL8Ir4d4nuFLvkXL6i4hIJSrvrtaN3f1Td99FcB6mUq49dvc8oG0p\nn/8C4fRZCb9z99f2c//3AvcecIAiInJIlTft9k+gE4CZTXH3gZUf0nfcfUCUnyciItEob9qt5GXK\nukZRREQOifKKj5fxXkRE5ICVN+3Wwcw2E4yA0sP3hMvu7rUrNToREUlK5T1MrnpUgYiISOrQLV5F\nRCRyKj4iIhI5FR8REYmcio+IiEROxUdERCKn4iMiIpFT8RERkcip+IhIlbZixQqys7OLX/369WPc\nuHHk5ubSvXt3srOz6dKlCwsWBI/qcneuueYamjdvTvv27XnnnXfinIGUJjkf+C4iSaNly5bk5uYC\nsHPnTurVq8eAAQP4xS9+wc0330zfvn2ZPn06I0eOJBaL8a9//YuVK1eycuVK3n77ba688krefvvt\nOGche0rI4mNm1wBXArWBC939rQpscwEwGfiBu5f7MLntO3bSdNQrBx1rVTe8XSGXKc+kkGw55o3p\nt1fbzJkzycrKokmTJpgZmzcHd/z6+uuvycrKAmDatGlccsklmBndu3cnPz+fzz77jAYNGkQav+xb\nQhYfgiep9gUuBX4I7LP4mNmRwDWAfv0RSWCTJk2iV69eAIwbN44zzzyT6667jl27dvHWW8GPgbVr\n13LssccWb9OoUSPWrl2r4lPFJFzxMbOHCR7vsASoAWwws58AV7v73DI2+wPwJ+C6cvY9FBgKULdu\nPUa3KzxkcVdV9dOD35iTXSrkmWw5xmKx3ZZ37NjBlClTePDBB4nFYtx///38/Oc/p2fPnsyaNYvz\nzz+fsWPHsmHDBhYvXkxhYfC12LRpE4sWLaKgoCAOWRy4goKCvb4GycTcE+9JCWaWB3QBhgEF7n7P\nPvp2BG5y94FmFgOuq8i0W+Nmzb3aRfcdooirruHtChn7XsL9DrLfUiHPZMtxz2m3adOmMX78eG64\n4QZycnLIzMwkPz8fM8PdyczMZPPmzVxxxRXk5OQwePBgIDhnFIvFEm7kE4vFyMnJiXcY+83MFrl7\nl/L6JfXVbmZWjeDx2cPjHYuIHJyJEycWFxSArKwsZs+eDcAbb7xBixYtAOjfvz9PPfUU7s78+fPJ\nzMxMuMKTCpLn16TSHQm0BWJmBvB94EUz61/e6Cf9sOqsKOWEZ7KJxWLkDcmJdxiVLhXyTOYct23b\nxowZM3jkkUdYvHgxABMmTODaa6+lsLCQI444gkcffRSAs846i+nTp9O8eXMyMjJ48skn4xm6lCHR\ni88WgiveSuXuXwN1i5b3Z9pNRKqOjIwMNm7cuFvbySefzKJFi/bqa2aMHz8+qtDkACX6tNtLwAAz\nyzWzU+LMwYU/AAANtElEQVQdjIiIVExCjnzcvWn4dgPQfj+2y6mMeEREZP8k+shHREQSUEKOfEpj\nZjcCF+7RPNnd74hHPCIiUrakKT5hkVGhERFJAJp2ExGRyKn4iIhI5FR8REQkcio+IiISORUfERGJ\nnIqPiIhETsVHREQip+IjIiKRU/EREZHIqfiIiEjkVHxE5JBZsWIF2dnZxa/atWszbtw4Jk+eTJs2\nbahWrRoLF+7+OK0777yT5s2b07JlS1577bU4RS5RS4p7u5lZgbvXinccIqmuZcuW5ObmArBz504a\nNmzIgAED2LZtG1OnTuWKK67Yrf+yZcuYNGkS77//PuvWreNHP/oRH374IdWrV49H+BKhpCg+lWH7\njp00HfVKvMOodMPbFXKZ8kwK8c4xb4/Hzs+cOZPjjz+eJk2alLnNtGnTGDRoEIcffjjHHXcczZs3\nZ8GCBfTo0aOyw5U4S6ppNwvcbWZLzew9M7s4bM8xs5iZPW9mH5jZs2Zm8Y5XJJlNmjSJwYMH77PP\n2rVrOfbYY4uXGzVqxNq1ays7NKkCkm3kcz6QDXQA6gL/MbM54bqOQBtgHfAmcBLw/0pubGZDgaEA\ndevWY3S7wojCjp/66cFvzMkuFfKMd46xWKz4/Y4dO5gyZQpnn332bu35+fksWrSIgoICANasWcPy\n5cuL+3z22We8//771K1bt8zPKSgo2G2fySrZ80y24nMyMNHddwJfmNls4AfAZmCBu68BMLNcoCl7\nFB93fxR4FKBxs+Y+9r1k+/LsbXi7QpRncoh3jnlDcorfT5s2jW7dunH++efv1ueoo46ic+fOdOnS\nBYB58+YBkJMTbHvnnXfSu3fvfU67xWKx4v7JLNnzTKppN2BfU2nflHi/k+QrvCJVxsSJE8udcgPo\n378/kyZN4ptvvmHVqlWsXLmSrl27RhChxFuy/QCeA1xhZn8D6gCnAiOAE/d3R+mHVWfFHidQk1Es\nFtvtN9ZklQp5VpUct23bxowZM3jkkUeK21544QWuvvpqvvzyS/r160d2djavvfYabdq04aKLLqJ1\n69akpaUxfvx4XemWIpKt+LwA9ADeBRwY6e6fm9l+Fx8ROTAZGRls3Lhxt7YBAwYwYMCAUvvfeOON\n3HjjjVGEJlVIUhSfor/xcXcnGOmM2GN9DIiVWB4WYXgiIrKHZDvnIyIiCUDFR0REIqfiIyIikVPx\nERGRyKn4iIhI5FR8REQkcio+IiISORUfERGJnIqPiIhETsVHREQip+IjIiKRU/EREZHIqfiIiEjk\nVHxERCRyKj4iKS4/P58LLriAE088kVatWjFv3jx+//vf0759e7Kzs+nduzfr1q0DYNOmTQwYMID2\n7dvTtWtXli5dGufoJVElVfExs4Jy1h9lZldFFY9IIrj22mvp06cPH3zwAe+++y6tWrVixIgRLFmy\nhNzcXM4++2xuu+02AP74xz+SnZ3NkiVLeOqpp7j22mvjHL0kqqR4mNx+OAq4CvhLeR2379hJ01Gv\nVH5EcTa8XSGXKc+ksL855o3px+bNm5kzZw5//etfAahRowY1atTYrd/WrVsxMwCWLVvG9ddfD8CJ\nJ55IXl4eX3zxBfXr1z80SUjKSKqRTxEzq2VmM83sHTN7z8zODVeNAY43s1wzuzueMYpUBR9//DH1\n6tXjpz/9KR07duTyyy9n69atQPB462OPPZZnn322eOTToUMHpk6dCsCCBQv45JNPWLNmTdzil8SV\nlMUH+B8wwN07AacBYy341W0U8F93z3b3Efvcg0gKKCws5J133uHKK69k8eLF1KxZkzFjxgBwxx13\nsHr1aoYMGcKDDz4IwKhRo9i0aRPZ2dk88MADdOzYkbS0VJtAkUMhWb9rDPijmZ0K7AIaAuXOC5jZ\nUGAoQN269RjdrrBSg6wK6qcH0zXJLhXy3N8cY7EYX331FXXr1mX79u3EYjGOP/54/v73v9OrV6/i\nfscddxzXX389p512GgCXXnopl156Ke7O4MGDWbNmDZs2bTrk+ZSloKCAWCwW2efFS7LnmazFZwhQ\nD+js7jvMLA84oryN3P1R4FGAxs2a+9j3kvXL853h7QpRnslhf3PMG5IDwL333kuDBg1o2bIlsViM\nU045hYYNG9KiRQsAHnjgATp37kxOTg75+flkZGRQo0YNJkyYQO/evenXr19lpFOmWCxGTk5OpJ8Z\nD8meZ7L+b8wE1oeF5zSgSdi+BTiyIjtIP6w6K8ZE+58qHmKxWPEPoWSWCnkeaI4PPPAAQ4YM4dtv\nv6VZs2Y8+eSTXH755axYsYJq1arRpEkTHn74YQCWL1/OJZdcQvXq1WndujWPP/74Ic5CUkWyFp9n\ngZfMbCGQC3wA4O4bzexNM1sK/EvnfUQgOzubhQsX7tY2ZcqUUvv26NGDlStXRhGWJLmkKj7uXiv8\ndwPQo4w+P440KBER2UuyXu0mIiJVmIqPiIhETsVHREQip+IjIiKRU/EREZHIqfiIiEjkVHxERCRy\nKj4iIhI5FR8REYmcio+IiEROxUdERCKn4iMiIpFT8RERkcip+IiISORUfEREJHIqPiIiEjkVHxER\niZyKj4iIRE7FR0REImfuHu8YqiQz2wKsiHccEagLbIh3EBFIhTxTIUdQnlVdE3evV16ntCgiSVAr\n3L1LvIOobGa2UHkmh1TIEZRnstC0m4iIRE7FR0REIqfiU7ZH4x1ARJRn8kiFHEF5JgVdcCAiIpHT\nyEdERCKn4iMiIpFT8dmDmfUxsxVm9pGZjYp3PAfDzI41s1lmttzM3jeza8P2OmY2w8xWhv8eHbab\nmd0f5r7EzDrFN4P9Y2bVzWyxmb0cLh9nZm+HeT5nZjXC9sPD5Y/C9U3jGff+MLOjzOx5M/sgPK49\nkvF4mtlvwu/ZpWY20cyOSIbjaWZPmNl6M1taom2/j5+ZXRr2X2lml8Yjl4Ol4lOCmVUHxgN9gdbA\nYDNrHd+oDkohMNzdWwHdgV+F+YwCZrp7C2BmuAxB3i3C11DgoehDPijXAstLLN8F3BvmuQn4edj+\nc2CTuzcH7g37JYr7gFfd/USgA0G+SXU8zawhcA3Qxd3bAtWBQSTH8fwr0GePtv06fmZWB7gZ6AZ0\nBW4uKlgJxd31Cl9AD+C1EsvXA9fHO65DmN804AyCOzc0CNsaEPxBLcAjwOAS/Yv7VfUX0IjgP+7p\nwMuAEfx1eNqexxZ4DegRvk8L+1m8c6hAjrWBVXvGmmzHE2gIrAbqhMfnZeDMZDmeQFNg6YEeP2Aw\n8EiJ9t36JcpLI5/dFX3TF1kTtiW8cCqiI/A2UN/dPwMI/z0m7JbI+Y8DRgK7wuXvAfnuXhgul8yl\nOM9w/ddh/6quGfAl8GQ4vfiYmdUkyY6nu68F7gE+BT4jOD6LSL7jWWR/j19CHtc9qfjszkppS/hr\n0c2sFjAF+LW7b95X11Laqnz+ZnY2sN7dF5VsLqWrV2BdVZYGdAIecveOwFa+m6IpTULmGU4hnQsc\nB2QBNQmmoPaU6MezPGXllRT5qvjsbg1wbInlRsC6OMVySJjZYQSF51l3nxo2f2FmDcL1DYD1YXui\n5n8S0N/M8oBJBFNv44CjzKzo/oUlcynOM1yfCXwVZcAHaA2wxt3fDpefJyhGyXY8fwSscvcv3X0H\nMBX4Icl3PIvs7/FL1OO6GxWf3f0HaBFeVVOD4CTni3GO6YCZmQGPA8vd/c8lVr0IFF0hcynBuaCi\n9kvCq2y6A18XTQdUZe5+vbs3cvemBMfsDXcfAswCLgi77ZlnUf4XhP2r/G+O7v45sNrMWoZNvYBl\nJNnxJJhu625mGeH3cFGeSXU8S9jf4/ca0NvMjg5Hib3DtsQS75NOVe0FnAV8CPwXuDHe8RxkLicT\nDMeXALnh6yyC+fCZwMrw3zphfyO42u+/wHsEVxvFPY/9zDkHeDl83wxYAHwETAYOD9uPCJc/Ctc3\ni3fc+5FfNrAwPKb/BI5OxuMJ3Ap8ACwFngYOT4bjCUwkOI+1g2AE8/MDOX7Az8J8PwJ+Gu+8DuSl\n2+uIiEjkNO0mIiKRU/EREZHIqfiIiEjkVHxERCRyKj4iIhK5tPK7iMihYmY7CS6bLXKeu+fFKRyR\nuNGl1iIRMrMCd68V4eel+Xf3QxOpMjTtJlKFmFkDM5tjZrnhs2xOCdv7mNk7Zvaumc0M2+qY2T/D\nZ73MN7P2YfstZvaomf0beMqC5xzdbWb/CfteEccURQBNu4lELd3McsP3q9x9wB7rf0zwqIA7wudL\nZZhZPWACcKq7rwqf5wLBXQAWu/t5ZnY68BTBHRAAOgMnu/t2MxtKcGuWH5jZ4cCbZvZvd19VmYmK\n7IuKj0i0trt79j7W/wd4Irwh7D/dPdfMcoA5RcXC3YtumnkyMDBse8PMvmdmmeG6F919e/i+N9De\nzIrui5ZJ8IAyFR+JGxUfkSrE3eeY2alAP+BpM7sbyKf0W+bv69b6W/fod7W7J97NJyVp6ZyPSBVi\nZk0Ink00geCO5J2AeUBPMzsu7FM07TYHGBK25QAbvPTnNb0GXBmOpjCzE8KH0InEjUY+IlVLDjDC\nzHYABcAl7v5leN5mqplVI3jeyxnALQRPNV0CbOO72/Lv6TGCRze/Ez6i4EvgvMpMQqQ8utRaREQi\np2k3ERGJnIqPiIhETsVHREQip+IjIiKRU/EREZHIqfiIiEjkVHxERCRy/x8lCdQ6O+9uLQAAAABJ\nRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "xgb.plot_importance(x_model, importance_type = 'weight')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }